On September 9, NVIDIA unveiled its latest GPU, the Rubin CPX, which is engineered specifically for long-context inference and video generation tasks. This GPU is set to dramatically boost the efficiency of AI inference computing, making it an ideal choice for applications that demand ultra-long context windows—such as programming and video generation. Constructed on the innovative Rubin architecture, the Rubin CPX chip features a distinct, separated inference architecture. This design divides the AI computing process into two distinct phases: a context phase and a generation phase. Such a division allows for the optimized allocation of computing and memory resources, enhancing overall performance. The Rubin CPX GPU boasts an impressive 30 petaflops of NVFP4 computing power and comes equipped with 128GB of GDDR7 memory. When compared to its predecessor, it offers a threefold improvement in attention processing capability, a critical metric for AI performance. The complete rack version of the Rubin CPX GPU is seamlessly integrated into the Vera Rubin NVL144 CPX platform. This platform delivers a staggering 8 exaflops of AI performance, which is 7.5 times greater than that of the previous system. Additionally, it features 100TB of high-speed memory and an impressive memory bandwidth of 1.7PB/s. NVIDIA asserts that deploying $100 million worth of this new chip hardware will enable customers to generate a substantial $5 billion in revenue. The Rubin CPX GPU is anticipated to hit the market by the end of 2026, marking a significant advancement in AI technology.
