Building multi-agent systems: when and how to use them
January 23, 2026 at 12:00 AM
Summary
TL;DR
Multi-agent systems add coordination overhead and failure modes; start single-agent. Use multiple agents mainly for context protection, parallel search/research, or specialization.
What actually happened
Anthropic reports teams spending months on multi-agent designs that prompting a single agent matched.
They focus on an orchestrator–subagent pattern: a lead agent spawns specialized subagents per subtask.
They highlight three cases where multi-agent consistently beats single-agent: context pollution, parallel work, specialization.
They warn against “role-based” agent splits (planner/implementer/tester/reviewer) due to handoff loss.
They propose a verification subagent as a reliably useful pattern.
Key numbers
Multi-agent implementations typically use 3–10x more tokens for equivalent tasks.
Context pollution example: tool results can add 2000+ tokens of irrelevant history.
Keep injected summaries to ~50–100 tokens when only essentials are needed.
Tool overload signals: often 20+ tools; also called out as 15–20+ tools.
Tool Search Tool can reduce token usage by up to 85%.
Verification loop example uses max_attempts = 3.
Why this was hard
Each extra agent adds prompts to maintain, more failure points, and unpredictable interactions.
Handoffs between agents lose context; “telephone game” degradation compounds across steps.
Coordination can consume more tokens than execution due to duplicated context and summarization.
Parallelism can increase total computation; thoroughness improves but total runtime may not.
How they solved it
Use an orchestrator to delegate bounded subtasks to subagents with separate contexts.
Apply “context protection”: offload high-volume retrieval to a subagent, return only essentials.
Parallelize independent facets with concurrent subagents, then synthesize results in the lead agent.
Specialize by toolset and/or system prompts to reduce tool confusion and conflicting behaviors.
Prefer context-centric decomposition: split work only where context can be isolated cleanly.
Use a verification subagent for blackbox checks with explicit success criteria and required tools.
Why this matters beyond this company
Multi-agent should be justified by a concrete bottleneck (context limits, parallelizable facets, tool/prompt conflicts).
Decomposition boundaries are about context isolation, not mirroring human job titles.
Verification is a good candidate for a separate agent because it needs minimal context transfer.
Stealable ideas
Keep the main agent “clean” by injecting compact summaries instead of raw tool dumps.
Treat 15–20+ tools as a warning; try on-demand tool discovery before adding more agents.
Decompose by independent contexts (facets, components with clean interfaces), not sequential phases.
For verifiers, mandate full-suite checks and explicit pass/fail criteria to prevent early wins.