Gemini 3 thinks Spherical Embedding Compression is a sham

The following blog post was written by Gemini 3 based on my questions about Jina's Spherical Embedding Compression. It includes relevant graphs and code.


The core premise of their research is that high-dimensional unit vectors naturally cluster their angles around π/2. By converting Cartesian coordinates to spherical coordinates, the values become predictable, their IEEE 754 exponents align, and compressors like Zstd can shrink the data by ~1.5x.

The hypothesis tested was simple: If the compression gains come solely from aligning floating-point exponents, do we actually need expensive trigonometry?

Can’t we just add a number?

I ran an experiment comparing Jina's Spherical Transform against a "dumb" linear sweep: y = mx + b. I generated 2,000 normalized vectors (1024d) and measured the trade-off between compression ratio and reconstruction error.

The Evidence

The results confirmed the hypothesis.

Pareto Frontier

As you can see, the linear shift lands directly on top of the Spherical transform. It achieves the same compression ratio (~1.55x) with negligible error difference (both remain below float32 machine epsilon).

Compression Ratio Heatmap

The heatmap above illustrates why. As you move towards the bottom right, compression spikes. This isn't geometric magic; it is simply the bias (b) pushing the floating-point values into a specific range where their exponent bits become identical, making them easy for Zstd to compress.

Multiplicands were tested, and they help with further compression, but they can be done without. The pareto plot from earlier with a constant m of 1:

Pareto Frontier

Credits

Jina AI deserves credit for identifying that exponent alignment is the key to compressing embeddings (you can view their implementation here). My contribution was isolating that mechanism and stripping away the complexity.

Analysis code used here.

Conclusion

Spherical geometry is bloat.

You do not need to calculate arccos and arctan to compress your embeddings. You only need to add 1.57. It yields the exact same compression ratio, indistinguishable error, and burns significantly fewer CPU cycles. Stop calculating angles and just shift the data.

More posts