Why do we still use chroma subsampling

Chroma subsampling is a simple idea. Our eyes are more sensitive to colors than they are to light, and most colors within a range are close to each other, so we can only send the chroma for every 4 pixels instead of for every 1.

Our eyes can barely tell the difference. As such, this system is used in basically all lossy image formats. JPEG? Used it since the start. AVIF and WEBP? On by default. You can find it anywhere.

However, there's a problem. As you can see in the diagram, if pixels are close together with different chromas, they end up being different. Here's an example where you can really see the artifacts:

Original

Chroma subsampled

I was thinking about this for a while, and came up with some ideas but eventually came to a conclusion. Here's how that all went in my head.

attempt 1: use more pixels with a gradient system

Instead of mapping every 4 pixels to 1 chroma, how about we map every 8 pixels to 2 chromas? We have one chroma for the low luminance, another chroma for a high luminance, and create a gradient between them.

This would help in cases like the above. It might also generally improve image quality.

On the other hand, if each box is the same luminance but with different chroma, that would create more artifacts. So I went back to the drawing board.

attempt 2: dynamic subsampling ranges

What if instead of always using a 2x2 grid to subsample, we vary the width to minimize artifacts? For example, in the example images above, we would have a long red strip, a long green strip, and a long red strip, and the place where the boundary is would match up with the place where the chroma changes.

This could work just fine. We could even add back attempt 1's system conditionally if it would improve quality. We could also make it so that the height could vary. But wait... are we just reinventing image compression?

realization: we shouldn't really use chroma subsampling

Chroma subsampling relies on the ideas that our eyes are less sensitive to certain parts and that there are big parts of images similar in one way in another. But those are the same ideas that image compression already uses.

Chroma subsampling essentially makes two versions of the image, a grayscale one at full res and a chroma one at half res. But since we already have the ability to vary our quality, why not just apply more lossy compression to the chroma? Surely that would be more effective than a simple downscale?

I don't know. I haven't made any codecs myself. I just find it funny that essentially all lossy codecs still use this system by default.