Google agents-cli: One Command Adds AI Agent Lifecycle Skills to Claude Code and Codex
11 hour ago / Read about 28 minute
Source:TechTimes

The offices of Google are pictured in London on February 28, 2026. JUSTIN TALLIS/AFP via Getty Images

Google released agents-cli on April 21, 2026, and it has shipped 13 updates in the 71 days since — the most recent, v0.6.1, arriving June 28 with a dependency refresh that brought the bundled Agent Development Kit to version 2.3.0 and fixed a publishing bug that caused Gemini Enterprise registrations to fail. The tool is not a coding agent. It is a specialist layer that sits alongside Claude Code, Antigravity CLI, Codex, or any other coding assistant and gives that assistant deep, current knowledge of Google's entire agent development stack — so a developer can ask their preferred tool to build and deploy a production-grade AI agent on Google Cloud without manually learning every underlying service.

That positioning matters more than it might appear. Developers building AI agents on Google Cloud currently face a fragmentation problem: the Agent Development Kit (ADK) handles agent logic, Cloud Run or Google Kubernetes Engine (GKE) handle deployment, Cloud Trace handles observability, Terraform handles infrastructure provisioning, and the Gemini Enterprise Agent Platform handles enterprise registration. Knowing how all these pieces connect — and what the correct sequence of commands is to wire them together — requires enough accumulated platform expertise that it effectively functions as a second job alongside building the agent itself. agents-cli absorbs that expertise into a set of structured knowledge files and makes it available to whatever coding assistant a developer already uses.

Read more: AI Coding Agent Skills Library Gives Any Tool 51 Senior Engineer Personas

How Skill Injection Works

The mechanism is specific and worth understanding. When a developer runs the one-line install command — uvx google-agents-cli setup — agents-cli places seven structured Markdown files into the coding assistant's skill directory. These files are not documentation pages for humans to read. They are machine-readable knowledge bundles written specifically for AI consumption, formatted according to the SKILL.md open specification that Anthropic published in December 2025 and that now has support across 32 coding tools.

At the start of each session, a coding assistant like Claude Code scans the name and a short trigger description from each installed skill — a few dozen tokens per skill — and loads the full content only when a user's request signals that the skill is relevant. A developer who asks their assistant to "deploy this agent to Cloud Run" automatically invokes the deploy skill, which supplies the exact commands, flags, and configuration sequence needed to complete the task correctly. Nothing has to be pasted in from documentation; nothing has to be inferred from a vague memory of a deprecated API.

The seven bundled skills cover the full ADK lifecycle: the workflow skill governs the overall development process and model selection rules; the ADK code skill carries the entire Python API for defining agents, tools, orchestration patterns, callbacks, and state management; the scaffold skill handles project creation and enhancement; the eval skill covers the evaluation methodology; the deploy skill covers all three deployment targets; the publish skill handles Gemini Enterprise registration; and the observability skill covers Cloud Trace, logging, and third-party integrations. A developer can run CLI commands directly from their terminal without a coding assistant, but the value proposition of the tool is that the assistant handles the scaffolding, configuration, and deployment automatically while the developer focuses on the agent's actual logic.

The Evaluation Problem That Most Agent Tools Skip

The most technically significant part of agents-cli is its evaluation pipeline — and it is the part most likely to be overlooked in a surface-level review of the tool. The Wikipedia article on AI agents identifies "lack of standardized evaluation methods" as a documented structural concern across the field. Most agent tooling today treats evaluation as a manual, ad-hoc process: developers run their agent against a few test cases, inspect the outputs, and decide whether it looks good enough. That approach does not scale to production and does not catch the failure modes that only emerge in multi-turn conversations or edge-case tool-use scenarios.

agents-cli ships a five-stage quality flywheel that addresses this gap end to end. The first stage, agents-cli eval dataset synthesize, uses LLM-driven user simulation to generate diverse, multi-turn test scenarios automatically from the agent's own instructions and tool definitions — without requiring a human to write test cases by hand. The second stage, agents-cli eval generate, runs the agent against those scenarios and captures full execution traces including tool calls. The third stage, agents-cli eval grade, scores those traces against built-in or custom metrics. The fourth stage, agents-cli eval analyze, clusters failure modes so developers can see where the agent is systematically underperforming rather than just which individual test cases failed. The fifth stage, agents-cli eval optimize, applies the GEPA algorithm — Genetic Evolutionary Prompt Augmentation — to iteratively refine the agent's root system instructions based on the failure analysis, then re-evaluates to verify the improvement.

That last stage is what distinguishes agents-cli from a deployment wrapper. GEPA is an iterative optimization algorithm confirmed in Google's official Agent Platform documentation; it uses the structure of evaluation failures as signal to propose targeted rewrites of agent instructions, then confirms whether each rewrite actually improved performance on the failing cases. A developer can run agents-cli eval optimize and have the tool automatically improve the agent's prompts rather than guessing at fixes manually.

This pipeline is what separates a production-grade agent from a prototype. A coding tool can scaffold and deploy an agent in minutes; only an integrated evaluation system can tell you whether that agent is actually reliable enough to be trusted with a production workload.

Read more: Gemini CLI Shutdown Takes Effect: CI/CD Pipelines Break as Go-Based Antigravity CLI Arrives

Local Development Without a Cloud Account

One constraint the project is explicit about: agents-cli does not require a Google Cloud account for local development. Developers can install the tool, scaffold a project, run evaluations, and iterate on agent behavior using a Google AI Studio API key with no cloud resources billed. The Google Cloud dependency only enters when deploying to production — at which point agents-cli handles the infrastructure provisioning, CI/CD pipeline setup, Cloud SQL session management for stateful agents, secret configuration, and observability integration automatically via its Terraform templates.

Platform support covers macOS, Linux, and Windows via WSL 2. The repository, licensed under Apache 2.0 and available at github.com/google/agents-cli, has accumulated approximately 3,600 GitHub stars since its public debut. Full documentation is at google.github.io/agents-cli.

What v0.6.1 Changed

The June 28 release updated the tool's bundled google-adk dependency from 2.2.0 to 2.3.0, ensuring that scaffolded agent projects start from current ADK APIs rather than a version that is 10 days old at time of project creation. It also fixed a bug in which publish gemini-enterprise registered Cloud Run and GKE deployments over the Agent-to-Agent (A2A) protocol by default rather than via ADK — which Gemini Enterprise invokes natively — causing re-publishing an A2A agent to create duplicate registrations and A2A agent cards to surface with incorrect public URLs on the first deploy. The update additionally corrected a misleading green "Skills updated" banner that agents-cli update displayed even when a skill had failed to update, and fixed failure message rendering in Windows PowerShell environments.

The project, which Google announced at Cloud Next 2026 on April 22, is explicitly the successor to Google's earlier agent-starter-pack open-source project, which has entered maintenance-only mode. New projects should use agents-cli; existing agent-starter-pack projects can migrate with no agent code rewrites, as their Terraform, tests, and CI/CD configuration carry over directly.


Frequently Asked Questions

How is agents-cli different from using ADK directly?

ADK is the agent framework itself — it defines how agents, tools, orchestration, and state work in Python. agents-cli is the lifecycle wrapper around it: it handles project scaffolding, evaluation pipeline setup, deployment to Cloud Run or GKE or Agent Runtime, Gemini Enterprise registration, and observability configuration. Using ADK directly means writing all of that infrastructure code from scratch and knowing which Google Cloud APIs connect which services; using agents-cli means asking a coding assistant to handle it from a natural-language prompt. The tool can also be used standalone from the terminal without a coding assistant.

Does agents-cli require Google Cloud?

Not for development or evaluation. A Google AI Studio API key is sufficient to scaffold projects, run agents locally, generate evaluation datasets, grade traces, and optimize prompts using GEPA. A Google Cloud project is required only when deploying to Agent Runtime, Cloud Run, or GKE — at which point agents-cli provisions the infrastructure automatically. Developers without a Google Cloud account can still use the full evaluation pipeline before committing to production deployment.

What coding assistants does agents-cli work with?

The GitHub repository lists Antigravity CLI, Claude Code, and Codex as explicitly supported, and the tool works with any coding agent that supports the SKILL.md standard — currently 32 tools including products from Google, Microsoft, JetBrains, and AWS. It can also run entirely without a coding assistant: every agents-cli command works as a standalone terminal tool for developers who prefer direct CLI control over natural-language delegation.

How does the GEPA evaluation optimizer actually work?

GEPA — Genetic Evolutionary Prompt Augmentation — is an iterative optimization algorithm that takes the failure patterns from an eval grade run and proposes targeted rewrites to the agent's system instructions. It then re-runs evaluation on the modified instructions to verify the improvement before accepting the change. The approach is genetic in the sense that it iterates through candidate instruction refinements, selecting and combining the changes that improve grade scores on the specific failing test cases. The result is a workflow where prompt engineering for production agents can be partly automated rather than relying entirely on developer intuition about which instruction changes will fix which observed failure modes.

  • C114 Communication Network
  • Communication Home