NVIDIA Rubin Ultra Four-Die GPU Cancelled: Packaging Limits Cut 2027 Performance in Half
13 hour ago / Read about 27 minute
Source:TechTimes

Nvidia CEO Jensen Huang introduces Vera Rubin, a next-generation AI data center platform, and Rubin Ultra, a next-generation AI GPU architecture, during the keynote address at the company's annual GTC developers conference in San Jose, California, on March 16, 2026. JOSH EDELSON/Getty Images

NVIDIA's most ambitious AI accelerator has quietly been cut in half. Semiconductor research firm SemiAnalysis confirmed on June 30 that the original four-die version of the Rubin Ultra GPU — unveiled with considerable fanfare at GTC 2026 just three months ago — has been cancelled due to manufacturing constraints TSMC cannot yet solve. The GPU that will actually ship in 2027 under the Rubin Ultra name will deliver roughly half the compute and half the memory bandwidth of what was announced. Enterprise customers planning AI infrastructure around the original Rubin Ultra specification need to rebuild their procurement models now, before contracts lock in.

The cancellation also signals something that reaches beyond NVIDIA's product roadmap: TSMC's CoWoS-L advanced packaging technology has hit a physical ceiling at the four-die scale, and its successor — CoPoS — is not expected to enter mass production until late 2028 at the earliest. The entire AI accelerator industry has assumed that packaging innovation could sustain annual performance leaps as traditional transistor scaling slows. The Rubin Ultra failure tests that assumption directly.

NVIDIA has not publicly commented on the design change.

Why the Original Design Was Too Big to Build

The four-die Rubin Ultra was an attempt to double the architecture of the standard Rubin GPU, which pairs two near-reticle-sized compute dies with eight HBM4 memory stacks on a CoWoS-L interposer. Rubin Ultra was to scale that to four compute dies and sixteen HBM4E stacks in a single package — a configuration that would have required an interposer spanning roughly 7.5 to 8 times the reticle limit, the hard physical boundary set by lithography optics at approximately 858 square millimeters per exposure field.

The engineering problem is mechanical as much as it is electrical. CoWoS-L combines multiple silicon dies and memory stacks on an organic substrate with embedded silicon bridge interconnects. At four near-reticle-sized dies, assembling more than 3,400 square millimeters of active silicon onto a single interposer introduces severe substrate warpage: the organic substrate and the silicon dies expand and contract at different rates under thermal cycling, bending the package in multiple directions. When that happens, compute dies lose complete contact with the substrate, creating electrical failures and dramatically suppressing manufacturing yield. A single defect destroys a package that can cost tens of thousands of dollars.

TSMC's proposed solution — CoPoS, or Chip-on-Panel-on-Substrate — replaces the traditional silicon interposer with a larger, panel-format redistribution layer that can use glass or sapphire materials. Glass substrates reduce the thermal expansion mismatch and have shown measurable improvement in warpage in TSMC's own validation data. But CoPoS pilot lines are not expected before the end of 2026, and mass production is targeted for late 2028 to early 2029 — a timeline plainly incompatible with a 2027 Rubin Ultra launch.

What the Revised Rubin Ultra Will Actually Deliver

Rather than abandon the Rubin Ultra name, NVIDIA is pivoting to a dual-die design — the same chiplet count as the standard Rubin — paired with HBM4E memory instead of the HBM4 used in the base product. That substitution matters: HBM4E doubles the data rate of HBM4 to 16 gigabits per second per pin across a 2048-bit interface, delivering up to 4.1 terabytes per second of bandwidth per stack, versus roughly 2 to 3 terabytes per second for HBM4. Eight HBM4E stacks will provide roughly 384 gigabytes of capacity per GPU, up from the 288 gigabytes in the standard Rubin, and substantially more per-stack bandwidth — though the total memory bandwidth falls far short of what sixteen stacks would have provided.

To approximate the aggregate compute that the original single-package four-die design would have delivered, NVIDIA is expected to use a "2+2" board-level configuration, pairing two dual-die Rubin Ultra GPUs on a single blade in the Kyber rack system. That approach achieves the four-die count at the rack level rather than within a single package, which is meaningfully easier to manufacture, enables higher yields, and reduces per-unit cost. The tradeoff is real, however: four dies in one package communicate across die-to-die interconnects with substantially higher bandwidth and lower latency than four dies split across two packages connected via board-level traces. For latency-sensitive inference workloads, that architectural gap matters.

Read more: SK hynix Ships 12-Layer HBM4E Samples Ahead of Schedule, Tightening the Race With Samsung

Advanced Packaging Has Hit a Physical Wall

The Rubin Ultra reversal belongs to a pattern visible to anyone tracking AI accelerator scaling. The standard Rubin itself already pushes CoWoS-L to approximately 4 times the reticle limit — the previous generation limit was roughly 2.5x. The four-die Rubin Ultra would have required nearly 8x, a doubling again of an already-extreme configuration. Every step up the interposer scale introduces non-linear increases in warpage risk, assembly complexity, and thermal management difficulty.

The semiconductor packaging industry has been treated by AI investors and hyperscalers as the mechanism that would substitute for slowing transistor scaling — the assumption being that if you cannot make individual dies faster through shrinks, you can make bigger systems by assembling more dies together. That assumption holds at modest package scales. At the four-die, multi-reticle scale that NVIDIA attempted, it stopped holding. The industry now faces a waiting period: CoPoS, if it achieves the warpage reductions its early validation data suggests, would make future quad-die or larger packages viable. But "late 2028 at the earliest" means at least two product generations of more conservative packaging.

HBM4E Memory Market and Competitive Fallout

The reduction from sixteen to eight HBM4E stacks per GPU package carries consequences for the high-bandwidth memory market. According to SemiAnalysis, the HBM market is already a single-buyer environment at the HBM4E tier — only NVIDIA is currently actively procuring HBM4E at volume. Halving the per-GPU stack count from the originally planned configuration meaningfully reduces aggregate HBM4E demand, at a moment when SK hynix and Samsung have both accelerated their HBM4E qualification timelines specifically in anticipation of Rubin Ultra volumes.

On the competitive landscape, the revised Rubin Ultra faces a more difficult comparison against AMD's forthcoming Instinct MI500, which uses HBM4E and targets the same 2027 window. The MI500's 2027 release was already positioned to compete on memory capacity; a dual-die Rubin Ultra rather than a quad-die version narrows the performance gap that NVIDIA might otherwise have maintained.

SemiAnalysis used the occasion to flag a longer-run concern: that NVIDIA's CUDA software ecosystem moat — historically the primary reason enterprises stay on NVIDIA hardware even when alternatives appear cheaper — is eroding faster than previously expected. The firm cited the growing share of inference workloads running on Amazon Trainium and training workloads running on Google TPUs at frontier AI laboratories. According to industry analysis, custom ASIC shipments are projected to grow at roughly 44 percent annually in 2026, nearly triple the growth rate for merchant GPUs.

NVIDIA has not publicly commented on the design change to Rubin Ultra. The standard Rubin GPU remains on track for mass shipments this summer to eight confirmed cloud partners: AWS, Google Cloud, Microsoft Azure, Oracle Cloud, CoreWeave, Lambda, Nebius, and Nscale.

Read more: NVIDIA Vera Rubin Ships This Fall: 8 Cloud Partners, 10x Lower Token Cost, HBM4 Triples Bandwidth


Frequently Asked Questions

What was the original NVIDIA Rubin Ultra, and what changed?

The original Rubin Ultra, announced at GTC 2026 in March, was designed with four compute dies and sixteen HBM4E memory stacks in a single package — double the architecture of the standard Rubin GPU. SemiAnalysis confirmed on June 30 that this design has been cancelled due to manufacturing constraints with TSMC's CoWoS-L advanced packaging technology. The replacement carries two compute dies (the same as standard Rubin) with eight HBM4E stacks, delivering approximately half the originally announced performance.

Why did CoWoS-L packaging fail for the four-die design?

CoWoS-L assembles multiple silicon dies and HBM memory stacks on an organic substrate using embedded silicon bridge interconnects. At the scale of four near-reticle-sized compute dies — spanning roughly 7.5 to 8 times the reticle limit — the substrate warps under thermal cycling because organic materials and silicon expand and contract at different rates. When warpage is severe enough, compute dies lose electrical contact with the substrate. TSMC's next-generation alternative, CoPoS, uses a panel-format redistribution layer with glass or sapphire materials to reduce this mismatch, but is not expected to reach mass production before late 2028.

Does the design change affect when Rubin Ultra will ship?

The revised dual-die Rubin Ultra is still targeted for 2027. NVIDIA expects to maintain the aggregate compute target at the rack level through a "2+2" board configuration pairing two dual-die packages on a single blade in the Kyber rack system. Per-package performance and memory capacity will be lower than originally announced, but rack-level throughput is expected to remain broadly comparable. NVIDIA has not publicly confirmed the design change or its specifications.

What does this mean for buyers planning AI infrastructure around Rubin Ultra?

Enterprise customers and hyperscalers that built procurement plans around the four-die Rubin Ultra's original specifications — particularly the approximately 1 terabyte of HBM4E memory and the full compute figure that sixteen stacks would have enabled — need to update those models. The chip that ships will have roughly 384 gigabytes of HBM4E per package. Rack-level performance through the 2+2 approach may approximate the original aggregate figures, but at higher system cost and complexity, since more GPU packages must be purchased and deployed to reach equivalent throughput.

  • C114 Communication Network
  • Communication Home