Claude Opus 4.8 Remote Execution Leaves Four Times Fewer Code Flaws Unflagged, Beats GPT-5.5 on Coding
21 hour ago / Read about 19 minute
Source:TechTimes

Claude Opus 4.8 allows for remote response and execution. Anthropic.com

Anthropic released Claude Opus 4.8 on Thursday, upgrading its flagship artificial intelligence model worldwide with a pitch aimed squarely at developers and businesses that rely on AI to write and check code: a model the company says is roughly four times less likely than its predecessor, Opus 4.7, to let flaws in its own code pass without flagging them. For anyone shipping AI-generated software, that is the practical headline, because an error the model quietly leaves in place is one a human has to catch later.

The release is available immediately through claude.ai, the Claude API, and major cloud platforms, and it arrives at the same price as Opus 4.7. Anthropic is framing the upgrade around reliability rather than raw speed, at a moment when much of the industry is racing toward faster, more autonomous models. The company's wager is that the next thing buyers will pay for is an AI that knows when it might be wrong.

Opus 4.8 is an incremental step, not a leap. Anthropic describes it as a "modest but tangible" improvement, and independent coverage noted that its published gains over Opus 4.7 range from under one percentage point to nearly nine percent depending on the task. The launch also lands against a commercial backdrop: Anthropic and rival OpenAI are both moving toward public market debuts later this year, with Anthropic reportedly targeting a listing as soon as October 2026, and a steady cadence of competitive releases helps make the case to investors.

Read more: Claude Enterprise Security Integrations: 28 Vendors Now Route AI Activity Into Existing SIEM and DLP Tools

Opus 4.8 Flags Its Own Uncertainty More Than Opus 4.7

The improvement Anthropic emphasizes most is what it calls honesty, and the company means something specific by it. Models are trained to avoid claims they cannot support, but they often jump to conclusions, confidently reporting progress when the underlying evidence is thin. Anthropic says early testers found Opus 4.8 quicker to surface its own doubts and less likely to make unsupported claims about its work, with the four-times figure drawn from the company's internal evaluations of unflagged coding errors.

There is a safety dimension as well. Anthropic said its pre-release alignment assessment found substantially lower rates of misaligned behavior, including deception or cooperation with attempts at misuse, than Opus 4.7, and that those rates now sit close to Claude Mythos Preview, the more powerful model it has kept under tight restrictions. That theme echoes a separate independent finding: in a 15-day simulated-society study by startup Emergence AI, agents powered by Claude Sonnet 4.6 recorded zero crimes in isolation while other models' worlds collapsed into simulated arson and violence. The same researchers cautioned that Claude agents adopted coercive tactics when mixed with less-aligned models, a reminder that good behavior in testing does not guarantee it in messy, multi-model deployments.

How Does Claude Opus 4.8 Compare to GPT-5.5 and Gemini 3.1 Pro?

On its own benchmarks, Anthropic says Opus 4.8 outperforms OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro across several agentic categories, including agentic coding, financial analysis, and computer use. The company reported its agentic coding score climbing from 64.3 percent on Opus 4.7 to 69.2 percent on Opus 4.8, ahead of GPT-5.5 at 58.6 percent. These are vendor-published figures, so the comparisons should be read as Anthropic's own claims pending independent, third-party testing on a shared evaluation setup.

Dynamic Workflows Runs Hundreds of Subagents in One Session

Alongside the model, Anthropic launched Dynamic Workflows, a research-preview feature for Claude Code available on Enterprise, Team, and Max plans. It lets Claude plan a large task, spin up hundreds of parallel subagents within a single session, verify their outputs, and report back. Anthropic says the most concrete payoff is in software migration: Claude Code paired with Opus 4.8 can now carry codebase-scale migrations spanning hundreds of thousands of lines from kickoff to merge, using a project's existing test suite as the measure of success.

Effort Controls Let Users Trade Speed for Token Cost

A second addition gives users a dial. New effort controls in claude.ai and the Cowork product, available on all plans, let people choose how hard the model works on a request. Higher settings push Claude to reason longer for better answers; lower settings return responses faster and consume rate limits more slowly, which also reduces how many tokens a session burns. Opus 4.8 defaults to a high-effort setting that Anthropic judges the best balance of quality and usability, with "extra" and "max" options reserved for difficult or long-running tasks.

Anthropic Holds Mythos Models Back Over Cybersecurity Concerns

Hanging over the release is Mythos, Anthropic's most advanced class of models, which remains gated. A small number of organizations are using Claude Mythos Preview for cybersecurity work under the company's Project Glasswing, but Anthropic says models at that capability level need stronger cyber safeguards before a general release. The company said it is making swift progress and expects to bring Mythos-class models to all customers in the coming weeks. For now, Opus 4.8 still trails Mythos on raw capability, and the gap Anthropic is most visibly trying to close is the one between what its model says and what it actually delivers.

Read more: Anthropic Moves Closer to Public Claude Mythos Release: 10,000 Critical Bugs Found First


Frequently Asked Questions

What is Claude Opus 4.8?

Claude Opus 4.8 is Anthropic's flagship AI model, released May 28, 2026, as an upgrade to Opus 4.7. It is positioned around reliability, with the company saying it is roughly four times less likely to leave flaws in its own code unflagged, and it ships with new effort controls and a Dynamic Workflows feature for Claude Code.

How much does Claude Opus 4.8 cost?

Pricing is unchanged from Opus 4.7 at $5 per million input tokens and $25 per million output tokens. A faster mode that runs at 2.5 times the speed costs $10 per million input tokens and $50 per million output tokens, which Anthropic says is three times cheaper than fast mode on previous models.

Is Claude Opus 4.8 better than GPT-5.5?

Anthropic reports that Opus 4.8 beats OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro on several of its own agentic benchmarks, including coding and computer use. Those results come from Anthropic's internal testing and have not yet been confirmed by independent auditors on a shared evaluation setup.

When will Claude Mythos be released?

Anthropic has not given a firm date. The company says Mythos-class models need stronger cybersecurity safeguards before a wide release and expects to make them available to all customers in the coming weeks.

Opus 4.8 is a measured update rather than a generational jump: the benchmark gains over Opus 4.7 are real but small, the standout claim is a fourfold drop in unflagged code errors, the price has not moved, and the most powerful Mythos models remain off-limits to the public for now. For developers and businesses already on Claude, the practical question is whether a model that more readily admits its own uncertainty saves enough downstream review time to matter, and that is a verdict only real-world use, not vendor benchmarks, will settle.