
Nvidia CEO Jensen Huang introduces Vera Rubin, a next-generation AI data center platform, and Rubin Ultra, a next-generation AI GPU architecture, during the keynote address at the company's annual GTC developers conference in San Jose, California, on March 16, 2026. JOSH EDELSON/Getty Images
NVIDIA's Vera Rubin NVL72 — the company's rack-scale AI supercomputer platform — moved another step closer to broad cloud availability on Wednesday when Bull and Foxconn announced they had begun manufacturing components for the system in Europe, with assembly taking place at Bull's factory in Angers, France. The announcement arrived seventeen days after CoreWeave became the first AI cloud provider in the world to bring up and fully validate a Vera Rubin NVL72 rack, confirming that the platform's H2 2026 delivery timeline is real rather than aspirational. For AI teams evaluating infrastructure decisions right now, the question is no longer whether Vera Rubin will ship on schedule — it is whether the claimed performance gains apply to their workloads.
The answer depends almost entirely on what those workloads look like.
The Vera Rubin NVL72 is NVIDIA's third-generation rack-scale AI supercomputer, integrating 72 Rubin GPUs and 36 Vera CPUs in a single fully liquid-cooled unit. It is the direct successor to the Blackwell GB200 NVL72 and carries the same physical form factor, but represents a full generational leap across every major specification: compute, memory bandwidth, interconnect speed, and power architecture.
The platform takes its name from Vera Florence Cooper Rubin, the American astronomer whose work on galaxy rotation curves in the 1970s provided the first major observational evidence for dark matter. NVIDIA has followed the practice of naming its platforms after scientists: Hopper for Grace Hopper, Blackwell for David Blackwell, and now Rubin.
At the chip level, the Rubin GPU delivers 50 petaflops of NVFP4 inference performance per card — five times the output of a Blackwell GB200 GPU. Across all 72 GPUs in the rack, that scales to 3.6 exaflops of inference compute. Each GPU also carries 288 gigabytes of HBM4 memory running at up to 22 terabytes per second of memory bandwidth, a nearly threefold increase over the 8 terabytes per second that HBM3e delivered in Blackwell systems.
The platform now comprises seven chips, following NVIDIA's March 2026 addition of the Groq 3 LPU to the original six: the Rubin GPU, the Vera CPU, the NVLink 6 Switch, the ConnectX-9 SuperNIC, the BlueField-4 DPU, and the Spectrum-6 Ethernet Switch. All seven are co-designed to operate as a unified system rather than a collection of discrete components.
Read more: CoreWeave Beats All Rivals to NVIDIA Vera Rubin NVL72: CRWV Stock Surges 14%
The most consequential engineering difference between Vera Rubin NVL72 and its predecessor is not the GPU's raw compute count — it is the interconnect fabric that ties those GPUs together.
NVLink 6, NVIDIA's sixth-generation GPU-to-GPU interconnect, doubles per-GPU bandwidth from the 1.8 terabytes per second that NVLink 5 provided in Blackwell to 3.6 terabytes per second. Each of the nine NVLink 6 switches in an NVL72 rack contributes 28 terabytes per second of switching capacity, bringing total all-to-all fabric bandwidth across all 72 GPUs to 260 terabytes per second. NVIDIA describes this as exceeding the bandwidth of the entire global internet.
That figure matters because of how today's most capable AI models are built. Modern frontier systems use mixture-of-experts, or MoE, architecture rather than the older dense-network approach that runs every parameter for every token. In a MoE model, a learned routing mechanism activates only a small fraction of specialized "expert" sub-networks per token — a model like Kimi K2 Thinking, for example, maintains 384 experts but activates only eight at a time. This selective activation is what allows MoE models to scale to hundreds of billions or even trillions of parameters without proportional increases in compute cost.
The tradeoff is communication. Because different tokens get routed to different experts, and those experts are distributed across GPUs, MoE inference requires constant all-to-all communication across the entire GPU domain at every inference step. The 130 terabytes per second that NVLink 5 provided in Blackwell was a bottleneck for trillion-parameter MoE models running at production scale. NVLink 6's doubling of that capacity directly removes this constraint, which is why NVIDIA's own performance claims for Vera Rubin are measured specifically against large MoE workloads.
NVIDIA claims the Vera Rubin NVL72 delivers one-tenth the cost per million inference tokens compared to the Blackwell GB200 NVL72. That figure is both accurate and specific: it is benchmarked on the Kimi-K2-Thinking model at 32,000-token input and 8,000-token output sequence lengths, comparing Rubin NVL72 against GB200 NVL72. It is not a general-purpose cost reduction that applies across all workloads.
For teams running large MoE models at 200 billion parameters or above, where per-token cost is the primary constraint, the economics of Vera Rubin are compelling. For teams running dense-architecture models below 70 billion parameters on existing Blackwell or Hopper capacity, the improvement will be materially smaller, and the disruption of waiting for Rubin access — which for most organizations will mean 2027, not H2 2026 — may not be justified.
The platform also trains large MoE models using one-fourth the GPU count required on the previous Blackwell generation, based on a 10-trillion-parameter model trained on 100 trillion tokens in a fixed one-month window. For frontier model labs whose training budgets are constrained by GPU count rather than by time, this is the number that changes the economics most significantly.
CoreWeave became the first AI cloud provider to operate a fully validated Vera Rubin NVL72 rack on June 1, 2026. The deployment was built on Dell Technologies' PowerEdge XE9812 liquid-cooled servers. CoreWeave also developed two patent-pending engineering systems specifically for the rack: Valvey, a software-defined per-rack valve assembly that turns liquid cooling management into a programmable function, and Racky, a unified rack control appliance that aggregates power, cooling, and environmental sensors. Both treat the NVL72 rack as a single manageable cloud resource rather than a collection of individual components.
Microsoft had claimed in March 2026 that it was the first cloud provider to bring up a Vera Rubin NVL72 system for validation, though CoreWeave's June 1 announcement characterized that milestone specifically as the first "AI cloud provider" deployment at production scale.
NVIDIA confirmed at its GTC Taipei keynote on June 1 that Vera Rubin has entered full production, with Samsung, SK hynix, and Micron all certified as HBM4 memory suppliers. SK hynix is estimated to hold roughly 60 to 70 percent of the allocated HBM4 volume, with Samsung at approximately 25 to 30 percent and Micron supplying the remainder.
The H2 2026 first deployment cohort confirmed by NVIDIA includes AWS, Google Cloud, Microsoft Azure, and Oracle Cloud, alongside NVIDIA Cloud Partners CoreWeave, Lambda, Nebius, and Nscale. Microsoft's Azure deployments will target its next-generation Fairwater AI superfactory sites in Wisconsin and Atlanta, with buildouts planned to scale to hundreds of thousands of Vera Rubin Superchips. Most enterprise teams without first-cohort hyperscaler contracts should plan for practical access in 2027; supply has consistently taken six to twelve months to reach broader availability across recent GPU generations.
The European manufacturing announcement on Wednesday adds a new geography to the supply chain. Under the Bull-Foxconn alliance, key components are being produced at Foxconn's facilities in the Czech Republic and assembled at Bull's factory in Angers, France. The resulting systems will commercialize under the Bull brand and are designed for AI factories and cloud providers in the European market.
Read more: Foxconn Debuts Humanoid Robots in Europe, Revealing Closed-Loop Physical AI Stack
The Vera Rubin NVL72 imposes infrastructure requirements that existing air-cooled data centers cannot meet without significant capital investment.
The platform requires 100 percent liquid cooling — no air-cooled configuration exists. Rack power draw runs between 190 and 230 kilowatts depending on workload, compared to approximately 140 kilowatts for the Blackwell GB300 NVL72 and roughly 40 kilowatts for prior-generation Hopper systems. The cooling transition from air to direct-to-chip liquid systems carries a retrofit cost of $500 to $1,500 per kilowatt of capacity, which translates to $60,000 to $195,000 per NVL72 rack for the cooling infrastructure alone, before addressing the electrical plant.
The electrical plant also requires redesign. Vera Rubin operates on an 800-volt DC power architecture, a departure from the 48-volt in-rack distribution standard that has been in place for over a decade. The conversion requires solid-state transformers that step medium-voltage AC down to 800 volts DC at the substation level, and most vendors had those components in pre-production status as of mid-2026. Lead times on liquid cooling retrofits run twelve to eighteen months for facilities designed around raised-floor air cooling.
For organizations building purpose-built AI factories or placing capacity at first-cohort hyperscaler partners, the infrastructure is handled at the data center level. For enterprise teams considering on-premises Vera Rubin deployment, the infrastructure costs and lead times are the binding constraint — not GPU access.
NVIDIA acknowledged in its FY2026 annual report that the company was "effectively foreclosed from competing in China's data center computing market" as of the end of that fiscal year, following successive waves of US export controls on advanced semiconductor technology. The Vera Rubin NVL72, with performance approximately 22 times higher than the chips NVIDIA is currently permitted to sell in China, is not available to Chinese buyers under the current regulatory framework.
The platform is fully available across North America, Europe, and the Asia-Pacific markets outside China, and the western cloud deployment pipeline — with hyperscaler capital expenditure approaching $700 billion across the top five providers for 2026 — represents the primary demand base for Vera Rubin production.
When will NVIDIA Vera Rubin NVL72 be available in the cloud?
The first cloud deployments are underway in the second half of 2026. CoreWeave had its first rack operational as of June 1, 2026. AWS, Google Cloud, Microsoft Azure, Oracle Cloud, Lambda, Nebius, and Nscale are confirmed for H2 2026. Most enterprise teams without first-cohort contracts should plan for practical access in early 2027, consistent with the six-to-twelve-month ramp that has applied to each recent NVIDIA GPU generation.
What is the difference between Vera Rubin NVL72 and Blackwell NVL72?
The Vera Rubin NVL72 doubles the NVLink interconnect bandwidth from 130 terabytes per second in Blackwell's NVLink 5 to 260 terabytes per second with NVLink 6, nearly triples per-GPU memory bandwidth from 8 to 22 terabytes per second with HBM4, and delivers 5x the per-GPU inference compute at the same rack scale. Rack power draw increases from approximately 140 kilowatts for GB300 NVL72 to 190–230 kilowatts for Vera Rubin NVL72, and the platform requires 800-volt DC power delivery rather than the 48-volt standard.
Which cloud providers will offer NVIDIA Vera Rubin NVL72?
AWS, Google Cloud, Microsoft Azure, Oracle Cloud, CoreWeave, Lambda, Nebius, and Nscale are all confirmed for H2 2026. Some providers, including AWS and Google Cloud, are expected to brand the Rubin R100 GPU as "H300" for catalog continuity, though the underlying hardware is identical. Bull, in partnership with Foxconn, announced European commercial availability with components manufactured in France and the Czech Republic.
Does the 10x cost reduction apply to all AI workloads?
No. NVIDIA's 10x lower cost per million tokens figure is benchmarked on the Kimi-K2-Thinking mixture-of-experts model at 32,000-token input and 8,000-token output sequence lengths, comparing Vera Rubin NVL72 against the previous-generation GB200 NVL72. Teams running large MoE models at 200 billion parameters or above will see the largest gains. Teams running dense-architecture models under 70 billion parameters on existing Blackwell capacity will see materially smaller improvements and may find it more cost-effective to remain on current hardware until Rubin supply broadens in 2027.
