/t · guide · cost

CLAUDE.md cost with parallel agents: the surfaces that multiply, and the two that do not.

Matthew Diakonov, Written with AI

Published May 18, 20268 min read

The phrase "parallel agents" hides four different cost surfaces in Claude Code, and they bill your CLAUDE.md on three different rules. Most guides on this topic give you one answer for all four. This page is the table that splits them, with the line of Anthropic's docs each row traces back to, and the formula for turning ccmd's single-agent token number into your real parallel-session bill.

1. Four surfaces, three billing rules

When you ask Claude Code to "do this in parallel" you can land on any of four surfaces, each documented on its own page:

Subagents (built-in or custom) spawned via the Agent tool. Single session, multiple workers, results return to the main agent.
Agent teams (experimental, opt-in with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1). Multiple full sessions, each a separate Claude instance, with a shared task list and inter-teammate messaging.
Background agents ( claude --agent or the agent view ). Many independent sessions, monitored from one place.
Git worktrees. Manual parallel sessions, one terminal per worktree.

Three billing rules cover all four, in increasing severity:

No CLAUDE.md cost. Two built-in subagents only: Explore and Plan. They walk the repo or draft an approach in read-only plan mode, and they skip your CLAUDE.md by design.
One extra rehydration per call. Every other subagent, built-in or custom. The subagent starts a fresh context window, loads the full memory hierarchy once, runs its task, then returns a summary. The CLAUDE.md cost is tokens × 1, not tokens × turns, because the subagent's own turns are billed against the subagent context, not the parent.
Linear scaling, every turn. Agent teammates, background agents, and worktree sessions are all full Claude Code sessions. Each one loads CLAUDE.md at start and every turn of every session re-bills it (subject to prompt caching when prefixes match). This is where the official 7x figure comes from.

2. Surface-by-surface, with the docs sentence for each row

What each parallel surface loads at start, what it costs per turn, and where Anthropic says so:

Surface	Loads CLAUDE.md?	Cost multiplier on the file
Explore (built-in subagent)	No	0x (skipped at startup)
Plan (built-in subagent)	No	0x (skipped at startup)
Custom subagent (any other built-in too)	Yes, full memory hierarchy	+1 rehydration per call
Subagent with --agent (whole-session mode)	Yes, full memory hierarchy	Same as a regular session: every turn
Agent team teammate	Yes, plus MCP servers and skills	Linear in team size; Anthropic: ~7x in plan mode
Background agent	Yes (a regular session)	1x per session, every turn
Git worktree session	Yes (a regular session)	1x per session, every turn

The two rows that read No are the entire content of the paragraph in the subagent docs under What loads at startup:

"Explore and Plan skip your CLAUDE.mdfiles and the parent session's git status to keep research fast and inexpensive. Every other built-in and custom subagentloads both."
Claude Code docs, Create custom subagents

3. The 7x figure, in context

Anthropic publishes one explicit multiplier for parallel work, on the costs page, in the agent-team section. The full sentence:

~7x

“Agent teams use approximately 7x more tokens than standard sessions when teammates run in plan mode, because each teammate maintains its own context window and runs as a separate Claude instance.”

Claude Code docs, Manage costs effectively

Three things are worth reading carefully in that sentence:

"each teammate maintains its own context window". That includes a fresh load of CLAUDE.md. The agent-teams docs confirm it in plain words: "When spawned, a teammate loads the same project context as a regular session: CLAUDE.md, MCP servers, and skills."
"plan mode". Plan mode is verbose; teammates explore and propose before implementing. The 7x figure is the upper-end common case, not a guarantee. A 2-teammate team that implements directly costs less; a 5-teammate team that spends most of its time in plan mode can cost more.
"runs as a separate Claude instance". Prompt caching only hits when the prefix is byte-identical to a previous request on that instance. Teammate caches do not share with each other or with the lead. A 6,000-token CLAUDE.md that prefix-caches well on the lead does not save anything for teammate context windows on turn one.

The figure, big

0xtokens for an agent team versus a standard session, in plan mode

Anthropic's own number, in the "Manage agent team costs" subsection of the costs page. The implication: whatever you saved by trimming CLAUDE.md in a single-agent session is worth roughly seven times more in a team session, because every teammate is paying the same per-turn floor on the same file.

4. The two exceptions, and why most guides miss them

Explore and Plan are the two built-in subagents Anthropic ships with Claude Code for parallel investigation. Explore reads code; Plan drafts an approach in read-only plan mode. Both are the workhorses of a "let me think before I commit" workflow, and both are deliberately exempt from the CLAUDE.md tax.

The doc sentence that grants the exception sits halfway down the subagent page, under What loads at startup, in the middle of a bullet list. It is one of those product decisions that is buried on purpose: surfaced if you read carefully, ignored if you skim. Most third-party "parallel Claude Code" posts skim. The result is a common assumption, repeated across guides, that everyparallel call multiplies your CLAUDE.md cost. It doesn't. Two named built-ins are the exception, and they happen to be the two you reach for most often in research-heavy work.

The actionable consequence: if your team's habit is to fan out three or four custom subagents for code review or exploration tasks, rewrite those flows to use Explore and Plan instead, when the work is read-only or planning. The token saving per call is the full size of your CLAUDE.md plus your MCP server descriptions plus the parent git status. On a typical 6,000-token CLAUDE.md with two MCP servers, that is several thousand tokens skipped per call, across however many calls you make in a session.

5. The math, applied to ccmd's analyzer

ccmd's analyzer scores one file and assumes one agent. Here is the cost block, verbatim, from src/lib/analyzer.ts:

analyzer.ts

The variable name says what the comment says: estimatedTokensFireEveryTurn is the file's full token count, billed every turn, for one agent. There is no fan-out variable in the formula. For a single session that is exactly right. For a parallel session you have to add one multiplier yourself.

The formula, expanded:

real_cost = tokens × turns × rate × (N_full_sessions) + tokens × N_subagent_calls

Where N_full_sessions counts the parent plus every teammate plus every background agent plus every --agent session running in parallel, and N_subagent_calls counts every custom or built-in subagent invocation except Explore and Plan.

Note the asymmetry. A teammate is billed per turn, because it is a full session that loads CLAUDE.md on every request the way the parent does. A subagent call is billed once, because the subagent does its work in its own context window and returns a summary; the parent does not pay for the subagent's turns against the parent's rate. Two different billing rules, two different multipliers.

6. A worked session, three parallel patterns

Same starting file, three different parallel patterns, side by side. A 6,000-token CLAUDE.md, a 30-turn long session, the analyzer's $15 per million input constant:

ccmd · parallel-agent cost ledger

Read the three blocks in order. Single agent is the baseline ccmd prints. Three custom subagents add one extra rehydration of CLAUDE.md each, for a small one-time cost. Three teammates add 90 more turns of 6,000-token re-injection, four times the input token count of the baseline, with a real bill roughly seven times what the analyzer showed for the parent alone. The last two lines are the Explore/Plan escape hatch: same "three parallel workers" story, zero CLAUDE.md added.

7. When the multiplier is worth paying

The point of this page is not to argue against parallel work. The agent-teams docs are direct about where teams earn their cost: research, parallel review, debugging with competing hypotheses, and new features split into independent modules. Anthropic recommends 3-5 teammates for most flows, Sonnet for teammates, and small self-contained tasks. That advice is real. The 7x figure is the cost of that win, not a warning against taking it.

What this page argues is narrower: the multiplier is your CLAUDE.md times your fan-out, and shrinking the multiplicand before fanning out is the highest-leverage optimization you can run. A 4,500-token cut on the file saves 4,500 tokens times every turn of every context window in a parallel session. Across a 3-teammate team that is roughly half a million input tokens per long session, every session, forever. Run the analyzer on ccmd.dev before you spawn the next agent team and you cut the bill at the only point in the stack where one edit affects every parallel surface.

8. Why the existing tools do not catch this

The two surfaces a Claude Code engineer reaches for to look at cost today, and what they each leave on the floor when parallel work is involved:

Feature	Custom subagent or teammate	Explore / Plan built-in
Loads CLAUDE.md hierarchy at start	Yes (full hierarchy)	No (built to skip)
How many copies of CLAUDE.md per fan-out of N	N extra, one per teammate or subagent	0 extra, per Explore/Plan call
Long-run session token cost scales with N	Yes, linearly (Anthropic: ~7x in plan mode)	No
Catches the same per-turn floor a single session pays	Yes, every turn of every teammate	No (it walks the repo, then stops)
Anthropic's stated rationale	(no rationale; default behavior)	"to keep research fast and inexpensive"

Both spawn a full Claude instance; the difference is which startup files load. Explore and Plan are the only two built-ins where CLAUDE.md is explicitly skipped.

Retrospective token-cost CLIs like ccusage tell you a session was expensive. They cannot tell you the expensive line was CLAUDE.md re-injected three times because you spawned three teammates instead of two Explore subagents. ccmd reports the per-turn floor; this page gives you the multiplier; the two together turn a vague "why was that session $9" into a line you can edit.

Want a dollar number on your parallel-agent sessions?

15 minutes, free. We paste through your CLAUDE.md, your MCP server set, and your typical fan-out pattern and leave with one number per parallel surface.

Frequently asked questions

Does CLAUDE.md cost multiply when I run parallel agents in Claude Code?

Yes, almost always. Each teammate in an agent team starts a fresh context window and loads your full CLAUDE.md hierarchy, MCP servers, and skills, so a team of N pays for CLAUDE.md N times every turn it runs. Every custom subagent does the same: the official subagent docs spell out that every built-in and custom subagent loads CLAUDE.md and project memory at startup. The one exception is Anthropic's two built-in research subagents, Explore and Plan, which the docs say explicitly skip CLAUDE.md to keep research fast and inexpensive. So multiplication is the default; the no-multiplier path is a narrow, named exception you have to use deliberately.

How much extra do agent teams actually cost?

Anthropic's own number, copied verbatim from the costs page: "Agent teams use approximately 7x more tokens than standard sessions when teammates run in plan mode, because each teammate maintains its own context window and runs as a separate Claude instance." The 7x figure assumes 3-5 teammates working in plan mode; smaller teams cost less, larger or longer-running ones cost more. The docs also say token usage is "roughly proportional to team size" and recommend 3-5 teammates as the working ceiling. Your CLAUDE.md is one of the things that rehydrates per teammate, so cutting CLAUDE.md tokens before fanning out is the single most leveraged optimization for parallel work.

Why don't Explore and Plan load CLAUDE.md?

Because Anthropic explicitly engineered them not to. The subagent docs, under "What loads at startup," say: "Explore and Plan skip your CLAUDE.md files and the parent session's git status to keep research fast and inexpensive. Every other built-in and custom subagent loads both." These are read-only research workers (Explore reads code; Plan drafts an approach in read-only plan mode), and Anthropic decided your CLAUDE.md rules about implementation style add token cost without adding research value. So the two built-ins that you call most often for parallel investigation are the two that do not multiply your CLAUDE.md bill. Use them.

Does the ccmd analyzer model parallel-agent cost?

Not yet. The analyzer scores one file at a time and assumes a single agent runs it. Line 264 of src/lib/analyzer.ts sets estimatedTokensFireEveryTurn = totalTokens, the literal token count of the file you pasted. Line 268-269 then dollarizes that against a 30-turn long session at the analyzer's $15 per million constant. That number is the single-agent baseline. To get your real parallel-agent cost, take the analyzer's per-session number and multiply by the count of parallel surfaces that load CLAUDE.md (parent + custom subagents that ran + teammates), and skip the Explore and Plan calls because they do not add the file's cost.

Does an Explore or Plan call cost nothing then?

It costs less, not nothing. Explore and Plan still spawn a full Claude instance with its own context window, its own system prompt, and its own tool calls. The thing they save is your CLAUDE.md re-injection, the MCP server descriptions, and the parent session's git status. On a 6,000-token CLAUDE.md with a couple of MCP servers, that is several thousand tokens skipped per call, which is the whole point. The work the subagent then does (reading files, drafting a plan) is still billed at standard rates. The headline is that one specific tax, the per-call CLAUDE.md rehydration, is waived.

What about git worktrees and background agents?

Both are full sessions, not subagents. The agent-teams docs link to git worktrees as a manual way to run multiple Claude Code sessions yourself, and the sub-agents docs link to background agents (claude --agent or the agent view) for many independent sessions monitored from one place. In both cases each session is a regular Claude Code session, which means each one loads the full CLAUDE.md hierarchy on its own. The cost behavior is the same as running Claude Code N times in N terminals: N copies of CLAUDE.md per turn, no Explore/Plan exception, no shared cache across sessions.

If I cut my CLAUDE.md from 6,000 to 1,500 tokens, what happens to a 3-teammate session?

The savings multiply by your fan-out. A 4,500-token cut on the file itself saves 4,500 tokens per turn for the parent. Across 4 contexts (parent + 3 teammates) each running ~30 turns, the absolute savings is roughly 4,500 x 4 x 30 = 540,000 input tokens per long-run session. At the analyzer's $15 per million constant that is about $8.10 saved per parallel session, on top of whatever you save in the single-agent case. This is why ccmd flags the same per-turn cost the way it does: the per-turn number is the unit, and your fan-out is the multiplier on top of it.

What is the cheapest pattern for parallel work in Claude Code today?

Three habits stack. First, use Explore and Plan as the default for any "go look at this" or "draft an approach" fan-out, because Anthropic explicitly waives CLAUDE.md cost on those. Second, keep CLAUDE.md under 200 lines, the limit Anthropic suggests on the costs page, because every other parallel surface multiplies whatever is in there. Third, when you do spawn agent teams or custom subagents, prefer Sonnet for teammates (Anthropic's own recommendation on the same page) and keep teams to 3-5 members, which is the working range the agent-teams docs recommend. Together those three move the bill more than any single one of them in isolation.