In March 2026, Google made waves in the tech industry by introducing TurboQuant, an innovative AI memory compression algorithm. This cutting-edge solution is engineered to minimize the memory demands of large language models and vector search engines. Employing a sophisticated two-step compression technique, TurboQuant achieves an impressive reduction in key-value cache memory, condensing it to a mere 3 bits while maintaining uncompromised accuracy. This breakthrough translates to a roughly sixfold decrease in memory consumption and an up to eightfold performance boost on NVIDIA H100 accelerators. Consequently, the market witnessed a general downturn in storage chip stocks.
