Hermes Agent Ships Async Subagents: Delegated Work No Longer Blocks Chat
5 hour ago / Read about 26 minute
Source:TechTimes

Hermes Agent github.com

Nous Research crossed a categorical architectural threshold on June 15, 2026, when it shipped asynchronous subagent support for Hermes Agent, the open-source autonomous agent that currently leads OpenRouter's global weekly token rankings. Until now, the agent's parent-child delegation model was sequential in practice: when a parent agent handed work to a child agent, the parent's conversation froze until every delegated task finished. The new async_delegation toolset — tracked in GitHub Issue #5586 — eliminates that constraint. Background subagents now run as in-process threads, return a task ID immediately, and complete without holding the parent session hostage. Existing users can access the update by running hermes update.

The change matters beyond workflow convenience. The prior blocking architecture meant that Hermes, despite being built around a parent-child delegation model since its February 2026 launch, could not function as a genuinely concurrent multi-agent runtime. Delegation was available, but each delegated task forced the entire workflow back to sequential execution for its duration. That is the same bottleneck that has historically distinguished open-source agent frameworks from enterprise orchestration platforms — and Nous Research just closed it.

Why the Blocking Architecture Was Structural, Not Incidental

Hermes Agent's delegation system centers on a tool called delegate_task, which spawns child agents — called subagents — to fan out work across parallel tasks. Each subagent receives its own fresh conversation, its own terminal session, and its own toolset. Only the final summary returns to the parent; intermediate tool calls and reasoning stay inside the child's context. That isolation keeps the parent's context window lean, which matters in long-running sessions that spawn many tasks over time.

The original delegate_task was synchronous: the parent entered the tool call and waited there, frozen, until every child completed. A session that dispatched a research task and a documentation task at the same time had to wait for one to finish before the other could even start. Users could not ask the parent a follow-up question, redirect a running task, or start a third task while the first two ran. Every delegation, regardless of how logically independent the work was from the parent conversation, forced a full stop.

This was not a minor UX issue. It was a structural limitation on the class of workflows the platform could serve. Long-horizon tasks — codebase audits, multi-step research, parallel documentation drafting — could not run alongside the parent session that supervised them. The architecture produced sequential behavior from an agent that was architecturally designed for parallelism.

Read more: Nous Research's Hermes Agent Dethrones OpenClaw as the World's Most-Used Open-Source AI Agent

How async_delegation Works Under the Hood

The async_delegation toolset resolves the blocking problem by dispatching background agents as in-process threads rather than executing them inside the parent's tool call. The parent calls delegate_task_async, receives a task_id immediately, and continues processing messages. The child agent runs independently using the same AIAgent machinery, credentials, and toolsets as the synchronous path — the execution model changes, but the isolation and capability guarantees do not.

The toolset covers the full lifecycle of an async subagent:

delegate_task_async spawns a background agent and returns a task ID immediately, freeing the parent to continue. check_task provides a non-blocking status poll that returns the current state and recent output of a running task without requiring the parent to wait. steer_task injects a message into a running task, allowing the parent or user to redirect work mid-flight. collect_task blocks until a task finishes and then returns the full result — used when the parent explicitly needs to wait for output before proceeding. cancel_task stops a running task. list_tasks returns all async tasks active in the session.

Credential management is unchanged. Subagents inherit the parent's API key, provider configuration, and credential pool, including key rotation logic that handles rate limits automatically. Routing subagents to different, less expensive models through config.yaml remains available — a cost-optimization lever that becomes more useful when multiple subagents can run concurrently.

What Does a Developer Control While Subagents Run

The operational shift is significant. A developer can now fire off a long-running security audit as a background subagent, return to the parent chat to ask an unrelated question or start a second task, and check back using check_task or the terminal user interface's /agents overlay whenever convenient. The TUI provides a live tree view of all running and completed subagents, giving real-time visibility into delegated work without requiring the parent conversation to pause.

Steering is the addition that changes the supervisory model most. steer_task lets a user inject a new instruction into a running subagent — redirecting scope, adding a constraint, or canceling a branch of work — without waiting for the task to complete or terminating it entirely. That was not possible with the synchronous architecture, where the parent had no channel into a running child.

The main limitation of the current implementation is durability: async subagents run within a single session and do not persist across session boundaries. Nous Research has tracked cross-turn persistence in a separate GitHub issue, #4949, under the Agent Conversation Protocol specification, which would extend async delegation to survive session restarts. That work remains in progress; for now, async subagents are scoped to the lifetime of the session that launched them.

Read more: AI Agent Business Models Split Four Ways: Open-Source Infrastructure, Token Distribution, SaaS, Acquisition

Where Recursion Stops: Safety and Depth Limits

By default, subagents cannot themselves spawn further subagents. The delegation toolset is blocked for leaf-level children to prevent unbounded recursion. For teams building nested multi-agent pipelines — where orchestrator agents coordinate worker agents that in turn coordinate sub-workers — orchestrator mode unlocks deeper nesting through a configurable max_spawn_depth parameter. The default remains flat: a parent can spawn many subagents, but those subagents cannot themselves delegate further unless explicitly configured to do so.

This constraint also limits the surface area of agentic systems in which a single compromised or misbehaving subagent could propagate errors through multiple levels of the hierarchy. Security researchers and enterprise governance teams have increasingly flagged unbounded agent recursion as a risk vector; the default blocking of child-spawned delegation is an architectural guard rather than an arbitrary restriction.

Concurrent Agents Are Now Table Stakes: What This Means for the Field

The timing puts this update in direct conversation with what commercial platforms now offer as baseline capability. OpenAI's Codex, described in its own technical documentation as designed for orchestrating multiple agents running tasks in parallel across isolated environments, has offered concurrent agent dispatch since its initial launch. Amazon Web Services' AgentCore, now in preview across four regions, provides parallel agent execution with managed stateful runtimes. Both treat non-blocking multi-agent concurrency as an expected baseline, not a premium feature.

Hermes Agent, which has accumulated more than 193,000 GitHub stars since its February 2026 launch and now leads OpenRouter's global agent rankings with approximately 4.94 trillion weekly tokens, closed that gap with a single toolset addition. The open-source baseline for concurrent multi-agent orchestration has moved. Teams evaluating whether to build on Hermes or migrate to a hosted commercial platform will now find the blocking-vs-concurrent distinction no longer separates them.

The broader implication is architectural: Hermes has shifted categories. What previously functioned as a capable persistent agent with delegation capabilities now functions as a concurrent multi-agent runtime. That changes which workflows are feasible at all — not just how fast they execute — and it does so within a platform that remains free, MIT-licensed, and self-hostable.


Frequently Asked Questions

What are async subagents in Hermes Agent?

Async subagents are child agents that a parent Hermes Agent spawns to handle delegated tasks without blocking the parent conversation. In the prior synchronous model, the parent froze until every child completed. With the async_delegation toolset, each child runs as an in-process background thread, returns a task ID immediately, and the parent session continues without interruption.

How do I enable async subagents in Hermes Agent?

Existing Hermes users can access async subagents by running hermes update from the terminal. The async_delegation toolset is then available through the standard delegate interface, with additional commands for checking status, steering running tasks, and collecting results.

How does Hermes Agent async_delegation compare to OpenAI Codex concurrency?

Both now support non-blocking concurrent agent dispatch. OpenAI Codex runs tasks in isolated cloud sandbox environments and is available through a subscription. Hermes Agent runs subagents as in-process threads on the user's own infrastructure and is MIT-licensed with no subscription requirement. The key architectural difference is ownership and deployment model: Codex is cloud-managed, while Hermes is self-hosted and model-agnostic.

What are the limits of async subagents in Hermes Agent?

Async subagents are scoped to the current session and do not persist across session boundaries. By default, subagents cannot themselves spawn further subagents — this recursion block is an intentional safety constraint. Orchestrator mode unlocks nested delegation through a configurable maximum spawn depth for teams building hierarchical multi-agent pipelines.