Claude Code Agents & Subagents: What They Actually Unlock

Kyle Redelinghuys

How I Got the UK Global Talent Visa as a Software Engineer SrvMon: Self-Hosted Server Monitoring Built in Go Claude Cowork: Closing the Gap Between Coding and Knowledge Work Teaching a Transformer to Read DNA: How EabhaSeq Works AI Agent Context Management: What I Built in Cont3xt Claude Agent SDK: Subagents, Sessions and Why It's Worth It I Built a Claude Code Cost Tracker - Was Max Worth It? Claude Code Pricing Guide: Which Plan Saves You Money OpenClaw: How I Built a Personal AI Operations Centre on Linux Claude Code Hooks: Automate Your AI Coding Workflow Have Anthropic Already Won the AI Race? Sonde: An AI Tool for Solving Complex Organisational Problems Open Sourcing EabhaSeq: Synthetic cfDNA for NIPT Research SoupaWhisper: Free SuperWhisper Alternative for Linux (Open Source)

Kyle Redelinghuys · 2026-03-16 · via Kyle Redelinghuys

I write a weekly newsletter covering what I've found actually works with AI coding tools, Go, and building products. If you want the configs, costs and workflows I use daily, it's worth subscribing./p>

Join the newsletter - it's free

I set up Claude Code agent files when the feature first landed. Created a few .claude/agents/ definitions, gave them names and tool restrictions, felt good about it, and then gradually stopped thinking about them. At some point Claude just started handling things well enough on its own that the agent files sat there gathering dust. They're still in my repos. They just don't get called.

I kept telling myself I wasn't missing anything, but context quality on my medium-to-large solo projects had started to feel off, with responses getting vaguer as sessions grew and the model losing track of decisions made earlier in a conversation. I was working around it using Cont3xt.dev, a tool I built specifically to manage AI context, and that helped, but it felt like I was solving a symptom rather than understanding the actual problem. So I went back and dug properly into what agents and subagents actually do, and more importantly what they unlock that a single-agent session can't.

The context window is the whole story

Standard Claude Code gives you a 200K-token context window per session. That sounds enormous until you're in a multi-hour session on a project with a dozen files open, a long conversation history, and tool call outputs stacking up. By the time you hit two-thirds capacity, response quality degrades noticeably, not because the model is worse, but because the context is full of noise and the model has to attend to all of it equally. I'd been experiencing this without quite naming it.

Subagents solve this by giving each delegated task its own isolated 200K-token context. The parent agent spawns a subagent with a specific prompt, the subagent does its work, reads files, runs searches, makes tool calls, and returns only its final output to the parent, be that a summary, a result, or a recommendation. All the intermediate noise stays inside the subagent's context and never touches the parent's conversation. The parent gets the signal, not the noise.

This is the actual value. Not parallelism, not specialisation, not the organisational tidiness of named agents. It's that isolation prevents the context rot that compounds over long sessions.

What the architecture looks like

The orchestrator-worker pattern is fairly simple. A parent agent analyses a task, decides whether to handle it directly or delegate it, and uses the Agent tool (previously called the Task tool, and both names still work) to spin up a subagent with a prompt string. The subagent runs with its own context, tool access, and permissions, then returns a single final message. Subagents cannot spawn further subagents, which keeps the nesting manageable.

Claude ships with three built-in subagent types. The Explore type handles read-only file discovery and codebase search, running on Haiku by default for speed and cost. The Plan type gathers context before presenting a strategy in plan mode. The General-purpose type handles anything involving both exploration and modification. Claude routes to these automatically based on task characteristics, though the auto-selection is imperfect in practice, and more on that below.

Custom agents are defined as Markdown files with YAML frontmatter, stored in .claude/agents/ at project scope or ~/.claude/agents/ at user scope. A basic definition looks like this:

---
name: code-reviewer
description: Expert code review specialist. Use immediately after modifying code.
tools: Read, Grep, Glob
model: sonnet
permissionMode: default
---
You are a senior code reviewer checking for bugs, security issues, and code quality.
Review any code changes and return a concise list of specific findings.

The tools field does something genuinely useful here: it physically restricts what the subagent can do. A reviewer defined with only Read, Grep, Glob cannot write files. That's not a naming convention or a prompt instruction, it's a hard constraint. For a solo developer running with broad permissions, having a review agent that structurally cannot modify code is worth something.

The model field lets you route different tasks to different models. Haiku for cheap exploratory reads, Sonnet for standard implementation, Opus for complex reasoning. On pay-as-you-go pricing the cost difference between models is substantial, and there's no reason a "find all files that import this package" task needs Opus.

The real cost picture

Subagents are not free. Each spawned agent opens its own context window, which means tokens multiply quickly. Anthropic's own documentation notes that multi-agent workflows use roughly 4-7x more tokens than single-agent sessions, and Agent Teams (the experimental multi-session variant announced in February 2026) run at roughly 15x standard usage. If you're on the API and paying per token, that multiplier matters.

I've written in detail about the Claude Code pricing options and the Max plan economics, and the core finding holds: over 90% of tokens in a typical heavy session are prompt cache reads at $0.50/MTok for Opus, which dramatically softens the apparent cost of subagent expansion. But the multiplier is still real, and running five parallel subagents burning through exploratory reads simultaneously on the Pro plan is a reliable way to hit rate limits in under twenty minutes.

The sensible approach is narrow scoping: use subagents for read-heavy, bounded tasks with a clear output, and keep the main session for anything requiring sustained, cross-cutting context. Don't spawn agents because you can.

Where they actually help

The most consistent community finding, which matches the logic of the architecture, is that subagents work best for read-heavy research and exploration, not parallel coding. A subagent sent to find all places a particular function is called, summarise a subsystem's behaviour, or check whether a proposed change would break any existing contracts will produce a small, clean output and keep its exploration cost internal. That's the use case the architecture is optimised for.

The C compiler example Anthropic uses as a flagship demonstration is instructive: 16 Opus agents, 2,000 sessions over two weeks, $20,000 in API costs, building a Rust C compiler from scratch. Impressive, and structurally possible only with subagents. Also completely inappropriate as a model for a solo developer on a SaaS product. The lesson from that project that does transfer is the decomposition principle: each agent worked on an independent failing test, with no cross-agent dependencies. When tasks are truly independent, parallel agents compound your speed. When they're coupled, you get coordination overhead and conflicting changes.

For medium-to-large solo projects, the pattern I find most defensible is 2-3 focused information-gathering agents running in parallel, with the main session synthesising their outputs and making decisions. An agent that reads and summarises all test failures. An agent that checks the database schema for a relevant table. An agent that scans for existing implementations of a pattern you're about to add. All of them returning concise outputs to a main session that then acts. That's meaningfully better than a single session doing all of those reads sequentially, because the context stays clean.

What doesn't work well yet

Auto-selection of custom agents remains unreliable. Claude frequently handles tasks in the main session rather than delegating to a defined agent, even when the agent is explicitly relevant and its description matches the task. The only reliable trigger is explicit invocation, which defeats the purpose of automatic routing for anyone who wants a seamless workflow. There are open GitHub issues on this, and it's a known gap, not an edge case.

Claude Opus 4.6 has a known tendency to over-spawn subagents. Anthropic's own prompt engineering documentation flags it: Opus will delegate to agents in situations where a direct approach would be faster and cheaper. If you're on Opus and wondering why a simple task consumed 50K tokens, an unnecessary delegation is a likely cause.

And there's no native observability. No trace view, no per-agent cost breakdown, no way to see what a running subagent is doing without looking at raw outputs. If you're building on the Claude Agent SDK, third-party tools fill some of this gap, but in the terminal Claude Code workflow you're largely flying blind on costs and subagent activity.

The hooks connection

Worth noting for anyone already using Claude Code hooks: hooks interact with subagents through dedicated lifecycle events, SubagentStart and SubagentStop, which means you can instrument your subagent activity, apply tool-level restrictions via PreToolUse, and validate outputs before they reach the parent. If you're already invested in hooks, the subagent lifecycle events add a meaningful layer of control over what agents can actually do.

Whether it's worth revisiting

I went into this research expecting to find that the feature had matured and I was missing something obvious. The answer is more nuanced. The core innovation, isolated context windows that keep exploration noise out of the main session, is genuinely valuable and solves a real problem I was experiencing on larger projects. The custom agent definitions give you a readable, version-controlled way to encode tool restrictions and model routing decisions that actually enforce behaviour rather than just prompting for it.

What hasn't fully landed is reliable automatic routing, which means you're often writing explicit invocations rather than building a system that knows when to delegate. And the cost multiplier is real enough that undisciplined subagent use will hurt you on any plan with token limits.

The old agent files in my repos are worth revisiting. Not as a multi-agent system with coordinating roles and specialised responsibilities, but as a small library of focused, read-only information gatherers that I explicitly invoke when I need clean, isolated context for a bounded research task. That's a narrower use case than the documentation implies, but it's one that actually maps to the architecture's strengths.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Kyle Redelinghuys

The context window is the whole story

What the architecture looks like

The real cost picture

Where they actually help

What doesn't work well yet

The hooks connection

Whether it's worth revisiting