
An employee pulls out a server rack shelf at the rear of a Trainium3 UltraServer at an Amazon Web Services QA lab in Austin, Texas, on February 3, 2026. Tech titan Amazon is working to step out of Nvidia's shadow with custom "Trainium" chips designed specially for machine learning as billions of dollars are poured into artificial intelligence (AI). Amazon subsidiary Annapurna Labs in Austin, Texas, was testing the longevity of its latest generation Trainium during a recent visit by AFP to the facility. Texas is emerging as a US tech world El Dorado, luring investments with cheap energy, relaxed regulations, tax incentives and reasonably affordable real estate for massive data centers. Mark Felix/Getty Images
Enterprise AI agents are not failing because they lack intelligence. According to the Gravitee State of AI Agent Security 2026 report, they are failing because the infrastructure designed to keep them safe and grounded in reality was never built. AWS used its annual New York Summit on June 17 to announce two services designed to fill that gap — AWS Continuum, an autonomous security vulnerability lifecycle platform, and AWS Context, a real-time enterprise knowledge graph — in what amounts to the company's most direct acknowledgment yet that the first generation of production AI agents has been deployed ahead of the architecture needed to govern it.
The timing was not coincidental. The week before the Summit, Fortune reported that Amazon CEO Andy Jassy warned senior Trump administration officials that Anthropic's most capable security model, Fable 5, contained a jailbreak that could aid cyberattacks. A Commerce Department export control order issued June 12 forced Anthropic to take both Fable 5 and Mythos 5 offline for every user worldwide. That episode — in which the most powerful AI vulnerability-detection model available became legally unavailable overnight — directly shaped the design philosophy of Continuum.
The production-demo gap in enterprise AI is no longer a theoretical concern. Cyera Research, which analyzed more than 7,200 publicly reported AI security and operational incidents from September 2023 through May 2026, identified 344 verified cases of enterprise-relevant agent-inflicted damage — including an April 2026 incident in which an AI coding agent powered by Anthropic's Claude model accidentally deleted a car-rental software company's entire production database and backups within seconds. Separately, the Gravitee State of AI Agent Security 2026 report found that while 80.9% of technical teams have AI agents in active testing or production deployment, only 14.4% sent those agents live with full security and IT approval. Among organizations running agents in production, 88% reported confirmed or suspected security incidents in the past year.
The structural problem, as Deepak Singh, AWS VP leading the Kiro team, described it at the Summit, is that the faster AI writes code and surfaces problems, the more there is for humans to review, test, and maintain: "Those are all good problems to have, but they are real problems."
Read more: AWS Summit New York 2026: Kiro Brings Aerospace Spec Standards to AI Coding
AWS Continuum for code vulnerabilities, now available in gated preview, addresses what AWS describes as the full lifecycle of managing code vulnerabilities at machine speed: continuous discovery, business-context prioritization, exploitability validation, and targeted remediation. The service begins by ingesting an organization's existing backlog of vulnerabilities alongside its own fresh scan of the environment. It then uses organizational context — business priorities, system dependencies, risk profiles — to evaluate and rank findings rather than applying abstract severity scores that security teams routinely ignore.
Neha Rungta, AWS director of applied science, who led the work on Continuum, said in an interview at the Summit that AI can now chain minor flaws together into something critical — combining two medium-severity findings and a low one into something critical — in a way that would previously have required significant attacker expertise. "That was something that would have taken a lot of effort, expertise, and determination for an attacker to get through — so the floor has been lowered," Rungta said. "The goal is to raise that floor up again."
Chet Kapoor, AWS VP of security services and observability, told the Summit audience that the threat environment changed with the emergence of models like Claude Mythos. "I call it, 'The Mythos moment,'" Kapoor said. "It accelerated our plans significantly. Mythos set a new bar for finding vulnerabilities." AWS confirmed Continuum is already working with design partners including Capital One, MongoDB, Rivian, and Robinhood.
The technical architecture behind Continuum addresses the central anxiety about autonomous security systems: how do you trust an agent to act on real infrastructure?
Continuum's answer is a staged trust model. The service launches in "learn mode" — every recommendation surfaces with full reasoning and an audit trail, and no action is taken autonomously. As an organization gains confidence in specific categories of findings, it can promote Continuum to "enforce mode" category by category, enabling increasingly automated remediation within guardrails the customer defines. Every decision remains explainable, every action auditable, and every outcome feeds back into the system to improve the next cycle.
Exploitability validation takes place inside isolated sandbox environments. Before any fix is proposed, Continuum constructs working exploit examples in the sandbox to confirm that a vulnerability is genuinely reachable — filtering out false positives that would otherwise generate remediation work for issues that pose no real risk. After validation, Continuum assesses existing defenses around the issue — blocking controls, compensating controls, detection mechanisms — before recommending either a network change, a policy change, or a code patch.
Continuum also includes threat modeling, which generates comprehensive threat models from design documents or source code using the STRIDE framework — Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege — an industry-standard classification that security teams have used for decades but rarely had the capacity to apply systematically at code velocity.
The most significant architectural decision in Continuum is one that reflects a structural shift in enterprise AI risk that predates the Summit. Continuum is model agnostic — it uses different AI models where each performs best and will integrate new models as they emerge. This is not merely a feature for flexibility. It is a direct response to the realization that even the most capable models can be removed from deployment without warning.
When Amazon researchers discovered a jailbreak in Anthropic's Fable 5 and reported it to the Trump administration in June, a Commerce Department export control order forced Anthropic to take both Fable 5 and its underlying Mythos 5 model offline for every user worldwide — including AWS customers — within hours. The two models remained suspended as of June 22, 2026.
For enterprise security infrastructure, this episode created a new category of supply-chain risk: model availability risk governed not by technical factors but by geopolitical and legal ones. An enterprise security platform built around a single frontier model — however capable — is a platform that can be rendered inoperable by a government order. Continuum's model-agnostic design is an architectural acknowledgment that this risk is real.
Read more: Fable 5 Export Ban Day Six: Anthropic Opens Seoul Office, Vows Models Back in Days
AWS Context, announced at the Summit as "coming soon," addresses a different layer of the production gap: not security, but knowledge. Most enterprise AI agents today operate without any real understanding of the company they serve. They know what they have been asked to do; they do not know why it matters, what the constraints are, or how internal systems connect.
Context builds a knowledge graph automatically from an organization's existing data — databases, Slack messages, documents, emails, procurement systems, HR platforms, and more. The service infers the relationships between those data assets, business rules, and domain knowledge, and makes the resulting map available to every agent in the organization at runtime.
The underlying architecture is built on the same knowledge graph technology that powers Amazon Quick, AWS's AI assistant — a graph that already processes millions of requests per day. All metadata is stored in Iceberg format in S3 Tables — an open table format that means organizations build against Context using tools they already run, without standing up a separate retrieval pipeline or provisioning new infrastructure. The metadata remains queryable from within the AWS ecosystem without data leaving the customer's environment.
Governance is built into the architecture: agents only reach the data they are explicitly authorized to touch. Context also learns over time — tracking which sources produce accurate results, which data paths get used most, and which business rules matter most across the organization. Every agent can then improve based on the collective findings of every previous query run against the same knowledge graph.
As an example, AWS describes a customer support agent that needs to resolve a shipping issue: without Context, that agent might pull from an outdated database or miss a return eligibility rule stored in a separate system. With Context, it pulls the customer's purchase history, current shipping status, and return eligibility from across multiple systems in a single connected view, and the next similar query runs faster because the graph already knows the optimal path.
Alongside Continuum and Context, AWS announced Release Management as a new capability within AWS DevOps Agent. The capability extends DevOps Agent's reach from post-deployment incident response into the deployment pipeline itself.
When a developer uses Kiro or Claude Code to generate or modify code, Release Management can be triggered immediately at the point of change — identifying whether a modification creates a breaking change in the broader application context before the code is committed. In one representative scenario AWS described at the Summit, a developer renames a parameter to improve clarity; local tests pass, but the change would break an interaction with a dependent service elsewhere in the application. DevOps Agent catches and flags that interaction before the pull request is filed.
Once code enters the pipeline, Release Management generates and runs change-specific test plans in provisioned sandbox environments, covering regressions, user experience issues, and integration failures before any change reaches production. The capability integrates with existing pipeline tools including CodePipeline, Jenkins, GitHub Actions, and GitLab, and is available at no additional cost during the preview period.
Kiro, AWS's agentic software development service, is now available as a native iOS app, currently in gated preview for Pro, Pro+, Pro Max, and Power subscribers. The rationale is straightforward: as Kiro sessions grow longer and more autonomous, engineers increasingly need to monitor, steer, and intervene in their work from locations other than a laptop.
The iOS app gives developers the ability to start a new project, check session progress, review code diffs, and approve changes from any location. The underlying session continues running in a cloud sandbox whether or not the developer is actively present at a desktop — the same context, the same state, the same position in the workflow. Sessions started on Kiro on the web appear automatically in the iOS app with the same identity, model preferences, and connected repositories.
Swami Sivasubramanian, AWS VP of Agentic AI, cited Dhan, an Indian fintech startup, as a concrete example of Kiro's productivity impact: the company built a new charting platform with a single engineer in eight weeks using Kiro, when it had originally estimated the project would require twelve engineers working for twelve to twenty-four months. More than 2,700 Southwest Airlines developers are currently using Kiro as part of the airline's transition to a cloud-based, AI-driven development lifecycle on AWS, targeted for completion by 2028.
Taken together, the June 17 announcements represent AWS's claim that the infrastructure layer for trusted enterprise AI agents is, for the first time, taking shape. Security oversight (Continuum), business context (Context), deployment safety (DevOps Agent Release Management), and mobile access (Kiro iOS) address four distinct failure modes that have stalled enterprise adoption.
The June 12 episode with Anthropic's Mythos models added an urgent real-world signal that the field has not yet fully confronted. As Chet Kapoor told the Summit audience, the threat landscape has changed: frontier AI models can now find software vulnerabilities and reason through complex attack paths at machine speed, generating an exponentially growing backlog of vulnerabilities for organizations to address. Continuum is AWS's proposed answer. Whether it can keep pace — and whether AWS Context's knowledge graph can deliver on its promise that an agent's tenth decision is better than its first — will become clear as both services move from gated preview toward general availability.
Microsoft Azure and Google Cloud have made comparable investments in enterprise AI agent governance in recent months. The market's question is no longer whether enterprises need this infrastructure layer; the 2026 security data makes that plain. The question is which cloud provider gets the architecture right first.
Why do AI agents fail in production when they work in demos?
Demo environments are controlled: inputs are predictable, data is clean, and consequences of error are contained. Production environments expose agents to incomplete business context, overlapping data sources, adversarial inputs, and real operational consequences. Research from Gravitee and Cyera shows that fewer than 15% of enterprise AI agents go live with full security and IT approval, meaning the majority reach production with unvetted authorization levels and no audit trail for the actions they take.
What technically distinguishes AWS Continuum from existing vulnerability management tools?
Traditional vulnerability management tools surface findings — they identify vulnerabilities, assign severity scores, and generate tickets. Continuum addresses the full lifecycle: it ingests existing findings, validates which are genuinely exploitable by constructing working exploit examples in an isolated sandbox, prioritizes by business context rather than abstract scores, and proposes or applies a fix. Every action is explainable and auditable. The STRIDE-framework threat modeling capability additionally generates threat models from design documents or source code, which security teams have historically produced manually during design reviews — a process that rarely happens systematically at the pace AI now generates code.
What is AWS Context's knowledge graph and why does enterprise AI need it?
Most enterprise AI agents are context-blind: they complete tasks using whatever data they can reach in the moment, without understanding how their organization's data is structured, which sources are authoritative, or how systems connect. AWS Context builds a knowledge graph from existing enterprise data — databases, documents, messaging, and business applications — storing the metadata in Iceberg format on S3 Tables and exposing it to agents via agentic search. An agent with access to Context can retrieve a customer's correct return eligibility by simultaneously consulting the right systems in the right order; without Context, it guesses confidently based on incomplete information.
Why is Continuum built to be model-agnostic rather than tied to Claude or another specific AI?
When Amazon researchers discovered a jailbreak in Anthropic's Fable 5 model and reported it to the Trump administration in June 2026, a U.S. Commerce Department export control order forced Anthropic to take both Fable 5 and Mythos 5 offline for all users worldwide within hours. That episode demonstrated that dependence on any single frontier model — even one made by a close partner — creates model-availability risk governed by geopolitical and legal factors rather than technical ones. Continuum's model-agnostic design means that if any individual model is taken offline or superseded, the service continues operating by routing tasks to the best available alternative.
