Claude Code Skills: Inside Anthropic's Playbook for the Nine Types That Actually Work
2 hour ago / Read about 22 minute
Source:TechTimes

Claude Skills anthropic.com

Anthropic has published an unusually concrete account of how its own engineers use Skills in Claude Code, the company's command-line coding agent, in a June 3 post on the Claude blog written by technical staff member Thariq Shihipar. The company says it runs "hundreds" of Skills internally, and it spells out the nine categories they fall into, which type returns the most value, and how to write one that actually helps — guidance that had circulated only inside Anthropic until now.

For developers already building with Claude Code, the practical payoff is a tested framework for deciding what to turn into a Skill and what to leave alone, drawn from a team that has run the experiment at scale rather than in theory.

A Skill Is a Folder of Scripts and Resources, Not a Prompt

The post opens by correcting a misconception it says is common: that Skills are "just markdown files." In practice, a Skill is a folder of instructions, scripts, and resources an agent can discover and use. It can hold a SKILL.md file alongside reference documents, scripts, templates, examples, hooks, and data that later runs read back. When Claude invokes a Skill, it gets a working kit for the task, not a single block of text.

The distinction matters because what teams usually lack is not another paragraph of prompting but a one-time consolidation of practices they have already validated: the easy-to-miss details, the common scripts, and the fixed procedures worth reusing.

What Are the Nine Categories of Claude Code Skills?

After cataloging its internal Skills, Anthropic found they cluster into nine categories that together trace a full software workflow, from supplying knowledge to shipping and operating code.

The first three give the model what it cannot infer. Library and API reference Skills explain how a specific library, command-line tool, or SDK should be used inside a team, with the rules and gotchas that are easy to get wrong. Product verification Skills check whether output actually works, such as running a full signup and checkout flow in a headless browser. Data fetching and analysis Skills connect to a company's data warehouse and monitoring systems, packaging query methods and field conventions so the model need not guess table structures.

The middle three absorb everyday team processes. Business process and team automation Skills compress recurring workflows into one command, like a standup that reports only what changed since yesterday. Code scaffolding and templates Skills generate code with a fixed skeleton but natural-language constraints a pure template engine cannot capture, such as a new service or migration file. Code quality and review Skills push code toward a team's standards; one example spins up a fresh-eyes subagent to critique the work, and such checks can be wired into continuous integration as hooks.

The final three reach into production. CI/CD and deployment Skills move code from development to release, chaining build, gradual rollout, error-rate comparison, and rollback conditions. Runbooks start from a symptom rather than a request, mapping an alert, a Slack thread, or a request ID to the right tools and producing a structured conclusion. Infrastructure operations Skills handle routine but often destructive chores like resource cleanup and cost investigation, which is why Anthropic says they must build in guardrails that notify and confirm before acting.

Why Verification Skills Deliver the Biggest Quality Gains

Among all nine types, Anthropic singles out verification as the most valuable, saying it has had the most measurable impact on output quality. The reason is a specific failure mode: a model can give the impression that a task is finished, and the last step — confirming the result — is exactly where work breaks down. The post suggests it can be worth having an engineer spend a full week making verification Skills excellent.

It offers two techniques. Have Claude record a video of its testing so a human can see precisely what was checked, and add programmatic assertions at key points so that state changes, persisted events, and final page states are confirmed rather than assumed to be "close enough." The company also frames focus as a reliability property: the best Skills fit cleanly into a single category, while those that try to do too much straddle several and confuse the agent — a caution against the common early instinct to make one Skill do everything.

The Highest-Signal Content Is the Gotchas, Not the Obvious

On what to actually write, Anthropic argues the highest-signal material is the gotchas. Because Claude already writes code and reads codebases, restating what it would do by default only adds context without adding value; what earns its place are the details that pull the model out of its default assumptions.

The examples are pointed: a subscriptions table that is append-only, so the correct row is the one with the highest version rather than the most recent timestamp; a single value that carries one field name in an API gateway and a different one in a billing service; and a staging environment that returns a success code even when a payment webhook did not actually process, so the true state must be read elsewhere. Each detail is minor on its own, but one wrong assumption skews the result — and these are the things a team knows and the model does not.

How Anthropic Says to Structure a Skill

The post treats the file system as part of the prompt. SKILL.md should act as a table of contents and signpost rather than a catch-all, dispatching detail into other files loaded on demand — function signatures into a references file, output templates into an assets folder — an approach Anthropic calls progressive disclosure. Skills should avoid railroading the model: supply the key rules but leave room to adapt, or a reused Skill will stall in an unfamiliar situation. Setup should be planned ahead, with user-specific context such as a target Slack channel stored in a config file and requested when missing, optionally through the AskUserQuestion tool for structured choices.

The description field, the post stresses, is written for the model, not for humans. Because Claude scans every Skill's name and description at the start of a session to decide what applies, the description is a trigger specification, and a keyword like "babysit" belongs in it directly. As Skills mature, the company adds, the first things they grow are memory, preloaded scripts, and on-demand hooks — the last able to block high-risk actions such as force-deletes and dropping database tables while a Skill runs.

Distribution Is Organic, and Usage Is Measured

For sharing, Anthropic describes two paths: checking Skills into a repository for smaller teams, or publishing them through an internal Claude Code plugin marketplace as a team scales, since every checked-in Skill adds to the model's context. It does not gate this centrally — useful Skills surface organically through a sandbox folder and Slack before an owner files a request to promote one into the marketplace. The company also logs Skill usage with a hook to find which Skills are popular and which are undertriggering, because, as the post notes, the real question is often not whether a Skill runs but whether it is invoked when it should be.


Frequently Asked Questions

What are Skills in Claude Code?

Skills are folders of instructions, scripts, and resources that Claude Code can discover and use to complete tasks more accurately. A Skill typically centers on a SKILL.md file but can also include reference documents, templates, scripts, and data.

What are the nine categories of Claude Code Skills?

Anthropic groups its internal Skills into library and API reference, product verification, data fetching and analysis, business process and team automation, code scaffolding and templates, code quality and review, CI/CD and deployment, runbooks, and infrastructure operations.

Which type of Skill does Anthropic value most?

Anthropic says verification Skills have the most measurable impact on output quality, because models can appear to finish a task without it actually working. The company suggests it can be worth spending a week perfecting them.

How do you make a Claude Code Skill trigger reliably?

Write the description for the model rather than for humans, since Claude scans Skill names and descriptions at the start of a session to decide what applies. Include the words and contexts that should activate it, such as specific trigger keywords.