AWS Rebuilds Its Server CPU Around Agentic AI With the 192-Core Graviton5 Launch - Chip

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

AWS Rebuilds Its Server CPU Around Agentic AI With the 192-Core Graviton5 Launch

4 hour ago / Read about 20 minute

Source：TechTimes

Wires from Amazon Web Services Trainium3 UltraServers are seen at a QA lab in Austin, Texas, on February 3, 2026. Tech titan Amazon is working to step out of Nvidia's shadow with custom "Trainium" chips designed specially for machine learning as billions of dollars are poured into artificial intelligence (AI). Amazon subsidiary Annapurna Labs in Austin, Texas, was testing the longevity of its latest generation Trainium during a recent visit by AFP to the facility. Texas is emerging as a US tech world El Dorado, luring investments with cheap energy, relaxed regulations, tax incentives and reasonably affordable real estate for massive data centers. Mark Felix/AFP via Getty Images

Amazon Web Services has made its in-house Graviton5 processor generally available through new Amazon EC2 M9g and M9gd instances, pitching the low-power Arm-based CPU as a platform for agentic AI workloads such as real-time inference, code generation, and multi-step task orchestration.

First previewed at AWS re:Invent 2025 and launched on June 10, Graviton5 is the fifth generation of Amazon's custom server chip. The framing matters more than the version number: AWS is explicitly redesigning a general-purpose CPU around agentic AI, a class of work that turns out to lean heavily on the processor rather than the GPU.

Why Agentic AI Is a CPU Problem

Training a large model is the GPU's job. But an AI agent that reasons through steps, calls tools, generates code, and spins up many concurrent environments spends much of its time on orchestration — the kind of branching, coordinating, data-shuffling work a CPU does, and which can leave expensive accelerators idle if the CPU can't keep up. AWS's pitch is that Graviton5 is built to keep those accelerators fed, and its design choices line up with that claim.

Graviton5 packs 192 Arm Neoverse V3 cores — the highest core density in EC2 — across a four-chiplet design on TSMC's 3-nanometer process, with 192 MB of L3 cache (which AWS frames as five times larger than the prior generation), DDR5-8800 memory the company calls the fastest in the cloud, and PCIe Gen 6 support. The mechanics behind the headline numbers are worth understanding: Graviton4 used two 96-core CPUs, so when a core needed data held by the other socket, the request crossed an interconnect that, as AWS compute VP David Brown put it, could take up to three times longer. Graviton5 consolidates all 192 cores into a single socket and adds a much larger cache, cutting both cache misses and that cross-socket latency — which is why memory-bound, highly concurrent agentic work is the use case AWS leads with. More than 120,000 customers already run on Graviton, and AWS says Meta has committed to deploy tens of millions of cores for agentic AI, with Uber and Snowflake also adopting it.

Read more: CPU Walkout Confirmed: Intel, AMD, and Arm Are Racing to Fill the Bottleneck Choking the AI Expansion

The Performance Claims, and Who Is Making Them

Nearly every performance figure here is AWS's own. The company says M9g delivers up to 25% more compute than Graviton4 and cuts inter-core latency by up to 33%, running web applications and machine-learning inference up to 35% faster and databases up to 30% faster. AWS also relayed preview results from customers: ClickHouse reported a 36% boost with no code changes, Honeycomb cited 36% better throughput per core over a six-month production A/B test, and HubSpot said MySQL query times fell by as much as 60%. Those are real-sounding numbers, but they are vendor and vendor-relayed figures, framed as "up to," and independent benchmarks were not yet available at launch.

Network and storage bandwidth improved too, by AWS's account: up to 15% more network bandwidth and 20% more Amazon EBS bandwidth on average, and up to double the network bandwidth on the largest instances. The M9gd variant, built for high-speed local storage, adds up to 11.4 TB of NVMe SSD and up to 30% higher IOPS than its predecessor. On the memory subsystem, AWS markets DDR5-8800 as the headline; outside reporting notes the chip runs at 7,200 MT/s today with 8,800 MT/s modules in the works.

The Catch: No Price, and Half the Family Missing

For a chip line whose entire selling point has always been price-performance, the most important number was missing at launch: M9g pricing had not been published when the instances went generally available. Without it, the price-performance case — including any comparison against GPU-based inference for the same agentic workloads — cannot be modeled, only asserted. The lineup is also partial. The general-purpose M9g and storage-focused M9gd are here, but the compute-optimized C9g and memory-optimized R9g instances are not due until later in 2026, so enterprises weighing Graviton5 for their most demanding inference or analytics jobs are evaluating it against a family that is not yet complete.

A Bet Bigger Than One Chip

The Graviton5 launch is one move in a broader shift. AWS has said Graviton accounts for more than half of all new CPU capacity it has added over the past three years, and its custom-silicon business has crossed a $20 billion annual run rate. Its 192 cores now match the top core counts from AMD and exceed Intel's, and it sits alongside Google's Axion and Microsoft's Cobalt as part of a hyperscaler move to custom Arm silicon that is steadily eroding x86's hold on the data center.

Read more: x86 Data Center Dominance Ends: Arm Crosses 50% Hyperscaler CPU Share at Computex

Both instance types run on the sixth-generation AWS Nitro System, which offloads networking, storage, security, and management to dedicated Nitro Cards so the main CPU can focus on customer workloads. AWS says the new Nitro generation delivers virtually all of a server's compute and memory to workloads while preventing any other system or person — including AWS's own operators — from logging into EC2 servers, reading instance memory, or accessing customer data. The notable claim is that its Nitro Isolation Engine uses formal verification, a method that mathematically proves a property holds rather than testing or auditing for it, to guarantee that isolation. That is a rigorous approach if borne out, but it remains AWS's claim for now; the company says it plans to let customers review the implementation and the resulting proofs, which has not yet happened.

Frequently Asked Questions

What is AWS Graviton5?

Graviton5 is Amazon's fifth-generation custom, Arm-based server processor, available through Amazon EC2 M9g and M9gd instances as of June 10, 2026. It places 192 Arm Neoverse V3 cores in a single package built from four chiplets on TSMC's 3nm process, with 192 MB of L3 cache, DDR5 memory, and PCIe Gen 6, and AWS positions it for agentic AI as well as general-purpose cloud workloads.

Why is Graviton5 pitched for agentic AI?

Agentic AI — software agents that reason through steps, generate code, and coordinate many concurrent tasks — leans heavily on the CPU for orchestration rather than the GPU, and can leave accelerators idle if the CPU stalls. AWS argues Graviton5's high core count, larger cache, and faster memory keep those concurrent environments moving and the accelerators fed, which is why it leads with the agentic-AI use case.

Is Graviton5 faster than x86 chips?

AWS reports large gains over its own Graviton4 and notes that Graviton5's 192 cores match the highest core counts from AMD and exceed Intel's. But the launch figures are AWS's own "up to" claims, and independent benchmarks against current x86 server chips were not yet available, so direct comparisons remain unverified for now.

How much do M9g instances cost?

AWS had not published pricing for the M9g instances when they became generally available. Because Graviton's main selling point is price-performance, the missing pricing makes it impossible to model the cost case — including against GPU-based inference — until AWS releases the rates.

Previous page：20 years of Intel Macs: Why Apple switched, and wh...

Next page：Samsung Electro-Mechanics Bets on a Turnkey Edge t...

Return to List

Hot Reading

2 day ago

Google DeepMind Maps the Road From AGI to Superintelligence: Four Paths and Hard Limits

2 day ago

Meta reportedly moves to unwind $2B Manus deal after Beijing’s demand

2 day ago

InfoComm 2026 Opens Today in Las Vegas: Agentic AI and IPMX Drive 750-Exhibitor Pro AV Show

2 day ago

OpenAI Retires GPT-5.2 and Moves Everyone to GPT-5.5: What Changes for ChatGPT Users and Developers

2 day ago

Anthropic shuts down Fable, Mythos models following Trump admin directive

2 day ago

Federal AI Preemption Talks: OpenAI Subpoena Shows What States Could Lose

2 day ago

Quantum Error Correction Validated in Nature: Microsoft and Quantinuum Log 800-Fold Improvement

2 day ago

Alice & Bob Ships Helium: First On-Premise Cat-Qubit System Claims 18-Qubit Logical Encoding

1 day ago

Google's DiffusionGemma Generates Text 4x Faster: Diffusion Replaces Token-by-Token Output

1 day ago

Local AI Inference Mini PC Now Runs 235B Models: AMD Ryzen AI Max+ 395 vs. Cloud Costs

Previous page：20 years of Intel Macs: Why Apple switched, and wh...

Next page：Samsung Electro-Mechanics Bets on a Turnkey Edge t...

C114 Communication Network
Communication Home

7 X 24 Track global technological trends

Find

News Topic

Hot Topic

7 x 24 Track global technological trends

News Flash

News Topic

AI
/
Devices
/
Smart Car
/
Chip
/
Cloud

C114 Communication Network

Communication Home