Software Deduplication: Update to previous comparisons!

Posted 2016-03-30 03:48 by Traesk. Edited 2016-03-30 17:15 by Traesk.

Purpose: I have read some feedback regarding my article about save ratings, and I was made aware that I forgot to take block size into account for ZFS.
Scope/limitations: My intention was never that my quick tests would be perfect, and obviously you should do your own tests if you are trying to choose between the alternatives, but I think this was a pretty big oversight and that it would be interesting to see if it makes any difference.
Method: I'm gonna do the same tests as last time, but try ZFS with a smaller block size and at the same time give Btrfs and Duperemove a new chance to do 4KiB deduplication. The method will be the same as last time, with the files being 22.3GiB Windows XP ISO-files.

Microsoft Windows Deduplication: Exclude folders retroactively

Posted 2015-12-03 18:47 by Traesk. Edited 2016-03-31 23:46 by Traesk.

Issue: If you exclude folders from Deduplication using
Set-DedupVolume z: -ExcludeFolder "Z:\Folder1", "Z:\Folder2"
...as described on TechNet, it will not "unoptimise"/expand the files already deduped in this folder. It will only prevent new files from being optimised. This is not made clear by Microsoft.

MSDN Subscriber Downloads: Folder Structure Generator / Crawler

Posted 2015-11-26 22:36 by Traesk. Edited 2019-06-01 17:50 by Traesk.

Update 2019-06-01: This script obviously does not work since a while back when MSDN was replaced with My Visual Studio.

Purpose: I have been hoarding original files from MSDN for a while. I used to recreate the folder structure from the site on my disk, like "MSDN\Operating Systems\MS-DOS\MS-DOS 6.22 (English)\en_msdos622.exe" and save the release-information from the site to a text-file. When I got my own MSDN-account and started downloading lots of files, this got unmanagable. 2-3 years ago I managed to create a script that does this for me, and thought I'd now share it if people have use of this specific code or the concept itself.
Scope/limitations: The sole purpose of this is to crawl the whole section of "MSDN Subscriber Downloads" and create a zip-file containing the structure and information of the releases. This is slow, unoptimised and with limited options, but serves it purpose. This tool would be better suited as an application, but we'll use PHP.
Method: Create a PHP-script that queries Microsoft for the desired information and saves it to a zip.

Converting subtitles from SVT Play

Posted 2015-09-04 20:40 by Traesk. Edited 2019-06-01 17:48 by Traesk.

SVT Play and Öppet Arkiv use a slightly modified version of the SRT-format, with some additional tags from WebVTT, to display subtitles. These files can be directly downloaded by finding the direct link in the source code or using a service such as pirateplay.se. They do work pretty well just out of the box, but can be modified to better match the intended style. Just like with the mp4-video itself, SVT use their Flash-based videoplayer to parse the content of the subtitle and apply the formatting. They do have a beta of HTML5-playback, but according to their FAQ it does not support subtitles yet. There are two differences in SVT:s format compared to the normal SRT-format:

Software Deduplication: Quick comparison of save ratings

Posted 2015-07-31 17:59 by Traesk. Edited 2016-03-30 04:03 by Traesk.

Purpose: To see if any of the alternatives is significantly better or worse, and ultimately decide what OS/filesystem to potentially use on a fileserver.
Scope/limitations: This test does not consider factors other than the actual save ratings, such as performance, security, and cost. The method is very limited and might not necessarily reflect the results in a real-world scenario. There might be some minor difference in the results due to rounding and possibly inconsistent variables used. This is the first time I get in touch with most of these systems, so the results are probably not optimized. The results were pretty clear though, so +/- some percent shouldn't matter.
Method: I used 22.3GiB worth of Windows XP installation ISOs, 52 ISOs in total. No file was exactly the same, but some contained much duplicate data, like the Swedish XP Home Edition vs the Swedish N-version of XP Home Edition. I deduplicated these files and noted how much space I saved compared to the 22.3GiB.