According to industry expert AGF, NVIDIA is crafting an ambitious roadmap for its Feynman architecture-based GPU, slated for release in 2028. A notable highlight of this plan is the anticipated integration of Groq's LPU (Learning Processing Unit) units into the architecture. This innovative approach may take cues from AMD's X3D CPU stacking methodology, leveraging TSMC's SoIC (System on Integrated Chips) hybrid bonding technology. The technology would enable the stacking of independent LPU chips, each outfitted with high-capacity SRAM arrays, onto the primary compute chip, which is to be manufactured using the advanced 1.6nm A16 process.
This strategic move is aimed at overcoming the limitations posed by SRAM process scaling and the inefficient utilization of premium silicon resources. By establishing vertical interconnections, it seeks to achieve swift, low-latency decoding responses. Despite potential hurdles, including thermal management challenges, execution-level conflicts, and the need for software adaptation, the LPU's strengths in minimizing latency and enhancing energy efficiency are poised to be NVIDIA's competitive edge in driving down inference costs for real-time services, exemplified by platforms like ChatGPT.
Should the integration prove successful, NVIDIA will evolve from a conventional GPU vendor to a provider of sophisticated hybrid inference/training hardware platforms. This transformation would exert fresh competitive pressures on rivals such as Google's TPU and AMD's MI series, potentially igniting a wave of innovation in the realm of heterogeneous collaborative computing.
