Google Introduces Compression Algorithm TurboQuant, Claims About 6x Memory Savings
2 day ago / Read about 0 minute
Author:小编   

Google has introduced a compression algorithm named TurboQuant, which is expected to reduce the memory requirements of artificial intelligence systems. TurboQuant primarily addresses the key-value cache bottleneck issue in large language models and vector search engines, which are becoming major memory bottlenecks as context windows expand. TurboQuant can compress key-value caches to 3-bit precision without retraining or fine-tuning the model, with virtually no impact on model accuracy. Testing results on open-source models such as Gemma demonstrate that this technology can achieve about a 6x compression effect on key-value cache memory.