Nvidia and AWS Deepen Ties to Speed AI Inference and Vector Search
6 hour ago / Read about 19 minute
Source:TechTimes

Visitors tour the Nvidia booth during the Nvidia Product Showcase at Computex 2026 in Taipei on June 3, 2026. AFP via Getty Images/I-Hwa Cheng

Nvidia and Amazon Web Services have expanded their partnership to make AI cheaper and easier to run at production scale, extending Nvidia infrastructure across Amazon EC2 and Amazon OpenSearch. Nvidia detailed the moves on June 24, centered on new Blackwell-powered EC2 G7 instances and GPU-accelerated vector search in OpenSearch Serverless.

The cloud-GPU story is usually told from the top down — biggest clusters, fastest interconnects, most exotic accelerators, because that is where the training arms race is loudest. This announcement is about the other end of the pipeline: the unglamorous work of actually running trained models in production, cheaply and without a team of specialists to babysit the hardware. Both pieces point the same way — toward making AI inference and retrieval ordinary.

The Compute Layer: Blackwell Goes Mid-Tier

The EC2 G7 instance runs on Nvidia's RTX PRO 4500 Blackwell GPUs paired with custom sixth-generation Intel Xeon processors, targeting AI inference, graphics, spatial computing, and GPU-accelerated analytics. AWS says it delivers up to 4.6 times the AI inference performance and 2.1 times the graphics performance of the prior G6 generation, and that it supports up to eight GPUs (256GB total GPU memory), 700 Gbps of EFA networking — seven times G6 — and up to 7.6TB of local NVMe storage, in sizes from one to eight GPUs plus bare metal. AWS, which calls itself the first major cloud to offer the chip, positions G7 as a mid-tier option for inference, rendering, video, and virtual desktops rather than training-scale clusters; it is initially available in two U.S. regions, Ohio and Oregon, with the general-availability launch having landed on June 18.

The positioning is the point. G7 sits below the G7e family AWS shipped in January on the larger RTX PRO 6000 Blackwell GPU: if G7e is the premium workstation for bigger generative models, G7 is the fleet vehicle — the volume tier that puts Blackwell into everyday inference, graphics, and analytics jobs without committing teams to top-of-rack hardware they don't need. That is what "right accelerator to the right workload" means in practice, and it is a quieter but arguably more consequential move than another training-cluster record.

Read more: NVIDIA Vera Rubin NVL72 Cloud Rollout Expands to Europe as H2 Deployments Near

The Retrieval Layer: GPU Vector Search Becomes a Default

On the retrieval side, AWS made Nvidia's cuVS library the default for vector indexing across all collections in next-generation OpenSearch Serverless. That matters for teams building retrieval-augmented generation (RAG), semantic search, recommendation systems, and agentic AI, because GPU-accelerated vector search — once a specialized optimization project — becomes a native AWS capability.

To see why that is a real shift, it helps to know what vector search does. Modern AI doesn't look things up by matching keywords; it converts text, images, or other data into long lists of numbers — vectors — that capture meaning, so that "find me something similar" becomes a geometry problem of locating the nearest points in a vast multidimensional space. Building the index that makes those searches fast is computationally brutal at scale, and it is exactly the kind of massively parallel math GPUs excel at. Nvidia says cuVS makes vector indexing up to 10 times faster than CPU-only builds at a quarter of the cost, making billion-scale vector databases practical to build in under an hour. Folding that into OpenSearch Serverless as the default means a developer gets it without standing up and tuning their own GPU pipeline — the optimization stops being a project and becomes a checkbox.

Read more: NVIDIA OpenAI Investment Shrinks From $100B to $30B: Compute Lock-In War Continues

Why It Reads as a Strategy, Not Just Two Products

"This collaboration focused on strengthening the entire AI infrastructure layer of AWS," Nvidia said, describing production-grade infrastructure that scales without added operational burden. Taken together, the two moves describe a deliberate push to normalize Blackwell-class capability across the messy middle of enterprise AI — the compute that serves a model and the retrieval that feeds it — rather than reserving the newest silicon for elite clusters.

There is a competitive edge to it as well. By bringing the RTX PRO 4500 to a managed, mid-tier instance, AWS occupies ground that rival hyperscalers have not yet matched at this tier; Google Cloud and Microsoft Azure have not shipped a comparable Blackwell-accelerated managed instance for the mid-range. If the next phase of cloud AI is decided less by who rents the biggest GPU and more by who makes the right-sized one easy and cheap to run, that is the contest these announcements are aimed at — with region expansion to Frankfurt and Tokyo signaled for later in the year.


Frequently Asked Questions

What is the Amazon EC2 G7 instance?

The EC2 G7 is a new AWS cloud instance type accelerated by Nvidia's RTX PRO 4500 Blackwell Server Edition GPUs paired with custom sixth-generation Intel Xeon processors. AWS says it delivers up to 4.6 times the AI inference performance and 2.1 times the graphics performance of the previous G6 generation, supports up to eight GPUs with 256GB of total GPU memory, 700 Gbps of networking, and up to 7.6TB of local storage. It is positioned as a mid-tier option for AI inference, graphics rendering, video, spatial computing, and virtual desktops — not for large-scale model training. It reached general availability on June 18, 2026, in AWS's Ohio and Oregon regions.

What is GPU-accelerated vector search, and what is cuVS?

Vector search is the technology behind AI "similarity" lookups: data is converted into numerical vectors that represent meaning, and a search finds the vectors closest to a query in a high-dimensional space. cuVS is an Nvidia library that performs the heavy work of building those vector indexes on GPUs instead of CPUs. Because the underlying math is highly parallel, GPUs can do it far faster — Nvidia says cuVS makes vector indexing up to 10 times faster than CPU-only builds at a quarter of the cost, and can build billion-scale vector databases in under an hour. AWS has now made cuVS the default for vector indexing in next-generation OpenSearch Serverless.

What is vector search used for?

Vector search underpins a range of modern AI applications, including retrieval-augmented generation (RAG), where an AI model retrieves relevant documents before answering; semantic search, which finds results by meaning rather than exact keywords; recommendation systems; and agentic AI, where software agents pull in context to act. In all of these, the system needs to quickly find the most relevant items among potentially billions, which is what vector indexing and search make possible.

How is the G7 different from the G7e?

Both are Blackwell-based AWS instance families, but they target different needs. The G7e, launched in January 2026, uses the larger RTX PRO 6000 Blackwell GPU with more memory per GPU, making it better suited to larger generative AI models and the heaviest rendering and spatial-computing workloads. The G7 uses the smaller RTX PRO 4500 GPU and is positioned as a higher-volume, mid-tier option for teams that don't need the largest per-GPU memory and would rather run a more cost-effective configuration across more jobs. In short, G7e is the premium tier; G7 is the volume tier.