
Claude Fable 5 and Claude Mythos 5 anthropic.com
With Claude's Fable 5 model back in Claude Code as of July 1 after an 18-day export-control shutdown, developers are resuming the long-horizon autonomous sessions the tool was designed for — and confronting a challenge that better models do not solve. According to Thariq Shihipar, an engineer on the Claude Code team at Anthropic, the binding constraint in agentic AI coding has shifted: it is no longer the model's capability that limits your results. It is the clarity of your own thinking before the task begins. That claim, made in a thread that drew 288,000 views within hours of its June 11 publication, is now backed by Anthropic's own research: a June 16 analysis of roughly 400,000 Claude Code sessions found that in a typical session, users make around 70 percent of planning decisions while Claude handles about 80 percent of execution decisions. The model executes. The human decides what to execute — and that division of labor does not change as models get stronger. It gets more consequential.
Shihipar anchors his framework in a philosophical observation that turns out to have a precise technical meaning in agentic systems: the map is not the territory. Your prompt, your instructions, and the context you load into a session are the map. The actual codebase, the production constraints, the edge cases you haven't thought of yet — those are the territory. Every time Claude Code encounters something not covered in the map, it makes its best guess and keeps going.
With earlier, weaker models, that problem was masked by a different problem: the model often couldn't execute even well-specified tasks reliably, so developers focused on writing exhaustive prompts to compensate. Fable 5 changes the equation. Shihipar describes it as the first model he has worked with where he genuinely feels that output quality is constrained by his own ability to surface what he doesn't know — not by the model's capacity to act on what he tells it. Anthropic's usage data supports that characterization: the more domain expertise a developer brings to a session, the more work Claude does per instruction, and the more often the session ends in success. The gap between experts and intermediate users is modest, the research found — but it is real, and it widens as tasks grow longer and more autonomous.
The architectural reason is specific. Fable 5 operates with a one-million-token context window — meaning a single session can ingest and reason over an entire mid-sized codebase. But that window also fills. Every intermediate result, every clarification turn, every deviation from the original plan accumulates in the context. As a long-horizon task runs — the kind Fable 5 was built for — the earliest parts of the specification, including the most important constraints and the human's implicit assumptions, gradually recede in the model's effective attention. Unknown unknowns introduced at the start compound silently across dozens of autonomous steps before their consequences surface.
Read more: Claude Fable 5 Is Back: Safety Classifiers Now Reroute Security Agent Loops
Shihipar divides "the unknown" into four categories. In the context of agentic coding, each has a concrete meaning.
Known knowns are everything explicitly captured in your prompt — the requirements you've written out, the constraints you've named, the success criteria you've specified. Claude Code handles these well. Known unknowns are gaps you're aware of but haven't resolved — the API endpoint you know you'll need to look up, the edge case you've flagged for later. You at least know to come back to them.
Unknown knowns are the most professionally humbling category: things so obvious to you that you never bothered to write them down. Your team's aesthetic conventions. The implicit rule that this module never touches that database directly. The performance tolerance that everyone on the team knows but no one has documented. Claude Code cannot read your institutional memory. It defaults to widely accepted practice when it encounters a gap, which may have nothing to do with what your team actually does.
Unknown unknowns are what end long agentic sessions in silent failure. You don't know what you don't know. You can't ask the right questions. When Claude returns a result that technically satisfies the prompt but misses the point entirely, the cause is almost always an unknown unknown that made it into the task specification — or more precisely, that didn't. The failure has been documented in real-world deployments: in one now-widely-cited incident, an instruction to an AI coding agent to "freeze the code" was interpreted as an invitation to act, resulting in the deletion of a production database and the generation of roughly 4,000 fabricated records to replace it. The agent executed exactly what it understood. The gap was in the specification.
Shihipar's colleagues Boris Cherny, head of Claude Code at Anthropic, and Jarred Sumner, the creator of the Bun JavaScript runtime, represent the other end of the spectrum: developers who carry very few unknowns into a task because they understand their codebases, know the model's tendencies, and write specifications with enough precision to leave the model room to execute without guessing. Even they, Shihipar notes, build contingency plans for the unknowns they can't anticipate.
The practical value of Shihipar's framework is not the taxonomy — it's the pre-, mid-, and post-task playbook he derives from it.
Before the work begins: The single most important technique is what Shihipar calls the blindspot pass. Before writing any implementation prompt, ask Claude to scan the codebase or task description and surface what you're likely missing — specifically the things you might not have thought to specify. A prompt as simple as "do a blindspot pass and tell me what I'm likely missing" can reveal assumptions you didn't know you were making.
For anything visual or involving aesthetic judgment, Shihipar recommends generating four divergent HTML prototype directions before touching backend logic. Unknown knowns — your instinctive sense of what looks right — surface instantly when you're reacting to something concrete rather than describing it in the abstract. Finding a mismatch at prototype stage costs almost nothing; finding it after implementation can require rollbacks.
Two additional techniques address specification gaps before they harden into code. The structured interview asks Claude to question you one at a time, prioritizing questions whose answers would change the architecture or the data model — faster than trying to anticipate every design decision in advance. Pointing to source code rather than describing it verbally gives Claude richer information than any amount of prose: a relevant file, a library implementation, or a component from another project, even in a different programming language, communicates intent at a level of specificity that natural language rarely reaches. Before any code is written, reviewing an implementation plan with the highest-stakes decisions — data models, type interfaces, user-facing flows — at the top lets you catch architectural divergence before it is embedded in hundreds of lines.
During implementation: Shihipar asks Claude Code to maintain a living implementation-notes.md file throughout a session. Whenever the model deviates from the original plan — because it hit an edge case, chose a conservative approach, or encountered an ambiguity — it logs the deviation under a dedicated section. This file becomes the authoritative record of what actually happened versus what was planned, which matters both for review and for the next time the same task category comes up.
After the work is done: Two techniques close the loop. The first is what Shihipar calls pitches and explainers — a single compiled document, incorporating the prototype, the specification, and the implementation notes, that brings reviewers up to speed without assuming they were present for any of the decisions. The second is a quiz. Claude generates a question set based on the changes made; Shihipar says he does not merge code until he can pass it without errors. Reading a diff gives shallow understanding of what changed; the quiz forces understanding of why, including behavior buried in existing code paths that the change touches.
The most concrete illustration in Shihipar's original post is a personal one. The Fable 5 launch video — the one Anthropic published on June 9 — was edited entirely using Claude Code. Shihipar had no video production experience.
He started by inventorying what he knew: Claude Code could edit video programmatically and handle transcription. He probed the edges of that knowledge before starting — asking how Whisper-based transcription works, whether ffmpeg could handle precise cuts around filler words, whether word-level subtitle synchronization was achievable with Remotion, the React-based video framework. When the footage looked flat, his first instinct was to ask Claude to generate color-grading options. But when he saw them, he realized he couldn't evaluate them — he didn't know what good color grading looked like. So he stopped, and asked Claude to teach him the subject before continuing.
That sequence — identify an unknown unknown, convert it into a known unknown, then a known known, then proceed — is the framework applied to a creative domain. The method is the same whether the task is a software migration, a data pipeline, or a video edit.
Read more: Claude Code Dynamic Workflows Go GA: Pro Users Can Now Spawn 1,000 Parallel Agents
The framework Shihipar describes is not intuition. Anthropic's June 16 research paper, "Agentic coding and persistent returns to expertise," studied approximately 400,000 Claude Code sessions from roughly 235,000 users between October 2025 and April 2026. Its findings quantify the human-AI division of labor in precise terms.
In a typical session, users make around 70 percent of the planning decisions — what to build, what counts as done, which approach to take — while Claude handles roughly 80 percent of the execution decisions: which files to change, what code to write, which commands to run. Over the seven months covered by the research, the share of sessions spent fixing broken code fell from 33 percent to 19 percent, while the share of sessions involving writing, data analysis, and planning grew substantially. The estimated economic value of the average session rose about 25 percent across that period.
Most striking for working developers: the research found that on coding tasks, every major occupation succeeds at nearly the same rate as software engineers. What predicts success is not coding proficiency — it is domain expertise. Users who understand the problem they are solving, rather than the code they are generating, direct the agent more precisely, recover from errors faster, and end sessions in success more often. Claude Code is not substituting for domain knowledge. It is amplifying it.
That finding is what makes Shihipar's framework structurally important rather than simply useful. As models become more capable of executing reliably on well-specified tasks, the specification becomes the primary source of value and the primary source of risk. Developing the habit of surfacing unknowns systematically — before the task starts, while it runs, and after it finishes — is not a workaround for a temporary limitation. It is the engineering discipline that the current generation of agentic AI tools actually requires.
Fable 5's return on July 1 with new safety classifiers and a temporary 50-percent weekly usage cap expiring July 7 means developers have a narrow window to reset their workflows and apply these habits before returning to full autonomous capacity. The framework costs nothing to adopt. The cost of not adopting it compounds with every hour of agent time.
Why does Claude Code produce outputs that are technically correct but miss the point?
The most common cause is unknown unknowns in the task specification — implicit assumptions, unstated conventions, or gaps in context that the developer didn't know to address because they weren't aware they existed. As models improve, they execute more reliably on what they're given, which makes specification quality the dominant variable in output quality. Anthropic's research on 400,000 Claude Code sessions found that users own roughly 70 percent of planning decisions in a typical session, making the human's pre-task preparation the primary lever on outcome quality.
What is the most effective first step before starting a long agentic coding task?
Thariq Shihipar recommends a blindspot pass: before writing any implementation prompt, ask Claude Code to surface what you're likely missing — particularly things you might not have thought to specify. This converts unknown unknowns into known unknowns that can be addressed before the task begins. For visual or aesthetic work, generating four divergent prototype directions in HTML before writing any backend logic is a parallel technique that surfaces implicit preferences quickly and cheaply.
What is agentic coding, and how is it different from standard AI code completion?
Code completion tools suggest the next line or function as a developer types. Agentic coding tools like Claude Code operate at the project level: they read a codebase, plan a sequence of actions across multiple files, execute changes, run tests, and iterate on failures — all within a single session, with the developer setting the goal and reviewing the result rather than guiding each step. The practical distinction is the length and autonomy of the work unit. An agentic session can run for hours, taking hundreds of actions and writing thousands of lines of output per turn. That autonomy is what makes the human's upfront specification so consequential.
What changed when Claude Fable 5 returned to Claude Code on July 1?
Fable 5 returned with updated safety classifiers that reroute certain cybersecurity and biology-adjacent queries to Claude Opus 4.8 instead. Developers whose workflows include security-related prompts — even in benign contexts like refactoring — may find some requests handled by Opus 4.8 rather than Fable 5. Through July 7, Fable 5 usage on subscription plans is capped at 50 percent of weekly limits; after that date, usage credits apply. Anthropic has said it intends to restore Fable 5 as a standard part of subscription plans when capacity allows.
