Although NVIDIA currently holds a dominant position in AI training, the company is formulating a strategy that has the potential to reshape the industry landscape. This is in response to the escalating demand for real - time inference. According to industry expert AGF, NVIDIA has plans to integrate Groq's LPU (Language Processing Unit) units into its Feynman architecture GPUs. These GPUs are slated for launch in 2028 and are expected to bring about a substantial improvement in AI inference performance.
This strategic move by NVIDIA might take a cue from AMD's X3D CPU stacking design. NVIDIA intends to leverage TSMC's SoIC (System on Integrated Chips) hybrid bonding technology. With this technology, it will stack independent LPU chips, which are equipped with large - capacity SRAM (Static Random - Access Memory) arrays, onto the main compute chip of the Feynman architecture. The Feynman main compute chip will be manufactured using the 1.6nm A16 process and will integrate tensor units along with control logic.
The primary objective of this initiative is to tackle the bottlenecks in SRAM process scaling and the inefficient use of high - end silicon resources. By establishing vertical interconnects, it aims to achieve low - latency decoding responses. However, this approach is not without its challenges. Thermal management, potential execution conflicts, and software adaptation are some of the hurdles that need to be overcome.
If NVIDIA can successfully implement this strategy, it could transform from a traditional GPU supplier into a provider of hybrid inference/training hardware platforms. This would not only exert new pressure on competitors such as Google's TPU (Tensor Processing Unit) and AMD's MI series but also drive innovation in the field of heterogeneous collaborative computing.
