A stream of thoughts on compression

It's a stream about compression. Get it? Compression stream? Ha ha.

Zip would be better if compression was cross-file

Throughout my experiments, I've found that ZIP was generally worse than other methods, mainly because each file gets its own stream. This means that the file name is stored in plain text, and data shared across files is treated as separate. This makes it just larger, as demonstrated in the FSR breakdown.

NEU breakdown

NEU is a large Java Archive, so I've been trying to figure out effective ways to reduce its size for a while now. I recently tried making tarballs organized by file type and using xz compression - here's what that yielded.

Type .tar .tar.xz
Archives 12.380 MB 10.302 MB
Images 13.486 MB 12.476 MB
Class files 12.667 MB 2.135 MB
Other files 2.999 MB 0.346 MB
Sum 41.5 MB 25.3 MB

For context: The original zip was 29.1 MB and the original sum of file contents were 37.8 MB.

After optimizing images (with zopflipng)

Type .tar .tar.xz
Images 11.940 MB 11.187 MB
Total 40.0 MB 24.0 MB

Yes, this means that just using zopflipng and using xz could subtract 5 MB from the archive. That doesn't mean that's practical, but it's interesting to look at.

FSR breakdown

FSR is a large texture pack using PNGs and ZIP, and my endeavors have shown that the archive format can make as much a difference as the file format.

"Compressed": zopflipng (with --lossy_transparent -m) and rewritten JSON

Format Base size Compressed size
Folder 4.4 MB 2.1 MB
.zip 4.5 MB - 4.9 MB 3.3 MB
.tar.gz 1.9 - 2.0 MB 885 - 957 kB
.tar.xz 1.7 MB 704 kB