NVIDIA Unveils Innovative KVTC Tech, Slashing Memory Consumption by 20-Fold - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

NVIDIA Unveils Innovative KVTC Tech, Slashing Memory Consumption by 20-Fold

2026-03-22 / Read about 0 minute

Author：小编

On March 22, 2026, researchers at NVIDIA introduced KVTC (KV Cache Transformation Coding) technology. This breakthrough compresses the KV cache—the Key and Value data generated during conversation processing by large language models (LLMs)—enabling a remarkable 20-fold reduction in memory usage, all without the need for any alterations to the model code. The KV cache, often likened to the AI model's 'short-term memory,' can balloon to several gigabytes during extended conversations, consuming GPU memory and hindering operational efficiency. NVIDIA senior engineer Adrian Lancucki pointed out that the current challenge in model inference frequently stems from limited GPU memory rather than computational capacity.

Drawing inspiration from JPEG compression techniques, KVTC achieves efficient data compression through a three-step process: principal component analysis, adaptive quantization, and entropy coding. This approach preserves essential information while supporting block-based decompression to maintain real-time model responsiveness. Tests indicate that on models with 1.5 billion to 70 billion parameters (such as the Llama 3 series and R1-Qwen 2.5), KVTC results in less than a 1% loss in accuracy even after 20-fold compression. In contrast, traditional methods suffer significant accuracy declines at just 5-fold compression.

When handling 8,000 tokens on an H100 GPU, KVTC cuts the initial response time from 3 seconds to a mere 380 milliseconds, marking an 8-fold improvement. This technology is particularly well-suited for long-conversation scenarios, such as programming assistants and iterative reasoning tasks. NVIDIA plans to incorporate KVTC into the Dynamo framework, ensuring compatibility with open-source engines like vLLM. Industry experts predict that as conversation lengths continue to grow, KVTC could emerge as a standardized compression tool for AI deployment, significantly reducing hardware costs for businesses.

Previous page：WeChat Unveils Its Official Lobster Plugin: ClawBo...

Next page：Cursor's Rebranding, With Jensen Huang as Guest of...

Return to List

Hot Reading

1 day ago

Seedance 2.5 API Is Live: ByteDance's 30-Second AI Video Carries Unresolved Copyright Risk

1 day ago

Inventec Warns AI Memory Crunch Has Reached Servers: Lead Times Hit 40-Plus Weeks

2 day ago

Intel Foundry Hits 85% Yield, Winning Chip Orders as ASML Validates High NA EUV

2 day ago

Taiwan's Semiconductor Ecosystem Logs Full-Sweep Gains Ahead of TSMC Earnings Thursday