NVIDIA Unveils Innovative Groq 3 LPU: Boasting 500MB SRAM Cache and 7x Bandwidth, Outperforms HBM4
3 hour ago / Read about 0 minute
Author:小编   

In March 2026, at the GTC conference, NVIDIA introduced an LPU inference chip that incorporates Groq technology. This move signifies a pivotal transition in the AI computing landscape, shifting the focus from "training-centric" to "inference-centric" power requirements. The LPU achieves remarkable performance enhancements and cost efficiency by leveraging several key strategies. It stores model weights directly on the chip's SRAM, employs a compile-time static scheduling architecture, and implements distributed inference techniques. As a result, when handling Llama2-70B model inference tasks, the LPU delivers performance roughly 10 times faster than the H100 GPU, at just one-tenth of the cost. The introduction of the LPU propels the evolution of AI inference hardware towards greater segmentation and efficiency. This development holds significant strategic value for NVIDIA, as it strengthens the company's market position and drives upgrades across the PCB industry chain. For instance, there is a surge in demand for high-density PCBs and M9-grade high-frequency materials. The LPU serves as a complementary solution to GPUs. While GPUs excel in large-scale parallel computing and dominate model training tasks, LPUs are optimized for low-latency inference, making them ideal for real-time interactive scenarios such as text generation. By integrating the LPU into the Vera Rubin platform, NVIDIA achieves substantial improvements in inference throughput and power efficiency, with gains of up to 35 times. Moreover, this integration maintains compatibility with the CUDA software ecosystem, facilitating rapid market adoption in the AI inference sector.