/t · guide · unit economics

Per turn token cost: the six line items every Claude Code request pays for

Matthew Diakonov, Written with AI

Published May 18, 20266 min read

Most "why is Claude Code expensive" posts stop at one line: CLAUDE.md fires every turn. That is true, and it is one of six things that fire every turn. The bill on any given API call is six concatenated billables, all priced at the input rate, plus the output. If you want to lower the per-turn number, you need to know which of the six you can actually touch.

This page walks the math line by line, with the analyzer formula from src/lib/analyzer.ts as the spine and the current Anthropic pricing as the rate sheet.

1. The six line items in one turn

A Claude Code turn is one POST to /v1/messages. The Messages API is stateless, so the harness rebuilds the entire context on every call. Six things land in that payload, all charged at the input rate, before the model writes a single output token:

Anthropic's system prompt. About 2,800 tokens. Hidden, not editable, charged at input rate. Includes safety, tool-use scaffolding, and the model's built-in identity.
Your CLAUDE.md. Project CLAUDE.md + every nested CLAUDE.md the harness walked up to + the user-global ~/.claude/CLAUDE.md. All concatenated, all firing every turn for the session.
Tool definitions. Schemas and descriptions for Bash, Edit, Read, Glob, Grep, Write, plus any MCP servers wired in. Bash alone adds 245 tokens; a verbose MCP server can add 1,500. This is usually the easiest line item to cut.
Tool-use system prompt. Fixed cost when tool_choice is auto or none: 346 tokens on every Claude 4.x model. 313 tokens when tool_choice is any or a specific tool.
Conversation history. Every previous user message, assistant message, tool call, and tool result from this session. Grows linearly with turn count until a /compact or a new session.
Your new user message + attached files. The text you typed plus any files, screenshots, or pasted content the harness inlines. Counts toward input tokens.

ccmd · one Opus 4.7 turn

Numbers above are for a mid-session turn after some history has accumulated. The first turn of a session pays for items 1-4 plus a tiny history; turn 30 pays for items 1-4 plus 30 turns of accumulated history. The growth is in line item 5.

2. The analyzer's formula, from the source

Our analyzer at ccmd.dev gives you the CLAUDE.md slice of one turn (line item 2). It does not know about lines 1, 3, 4, 5, or 6 because those depend on your harness configuration and your session state. The CLAUDE.md slice is the part you can score from a static file, and it is the part that fires unconditionally on every turn:

src/lib/analyzer.ts

The hardcoded OPUS_IN_PER_M = 15 was Opus 4.1's input rate. Anthropic dropped Opus 4.5, 4.6, and 4.7 to $5/M, so the analyzer overstates input cost by 3x on the current default Claude Code model. Output rate also dropped: $75/M on 4.1 down to $25/M on 4.7. The ratio is preserved; the absolute is not. Divide the analyzer's reported cost by 3 if you're on Opus 4.7.

Working example

A 6,042-token CLAUDE.md, 30-turn Opus 4.7 session, no cache:

6,042 × 30 × $5 / 1,000,000 = $0.91 of CLAUDE.md input across the session.
Per-turn CLAUDE.md slice: $0.030.
With a hot prompt cache (rate × 0.1): $0.003 per turn. 10x cheaper.
The analyzer reports $2.72 for the same file (the old $15/M rate). Divide by 3 for the real Opus 4.7 number.

3. Rates by model (current vs. deprecated)

Same shape of turn, four different bills depending on the model. Numbers are from the Anthropic pricing docs, verified 2026-05-18. The deprecated column is what you still pay if your harness is pinned to Opus 4.1 or earlier.

Feature	Deprecated rates	Current models
Base input ($/M tokens)	Opus 4.7: $5 \| Sonnet 4.6: $3 \| Haiku 4.5: $1	Opus 4.1 (deprecated): $15 \| Opus 4 (deprecated): $15
Output ($/M tokens)	Opus 4.7: $25 \| Sonnet 4.6: $15 \| Haiku 4.5: $5	Opus 4.1: $75 \| Opus 4: $75
5-minute cache write ($/M)	Opus 4.7: $6.25 \| Sonnet 4.6: $3.75 \| Haiku 4.5: $1.25	1.25x base input price (same for all current models)
Cache read / hit ($/M)	Opus 4.7: $0.50 \| Sonnet 4.6: $0.30 \| Haiku 4.5: $0.10	0.10x base input price (a 10x discount when the cache hits)
Per-turn cost on a 24,568-token input + 1,200-token output	Opus 4.7: $0.153 \| Sonnet 4.6: $0.092 \| Haiku 4.5: $0.031	Opus 4.1: $0.459 (3x more expensive than the same job on 4.7)
Per-turn cost with 5-minute cache hit on most of the input	Opus 4.7: $0.098 \| Sonnet 4.6: $0.058 \| Haiku 4.5: $0.020	Cache write costs 1.25x the first turn it's set; pays off on turn 2.

The 3x gap between Opus 4.7 and Opus 4.1 is the single biggest per-turn lever most teams have not pulled. If your CLI is still pointing at claude-opus-4-1, every turn pays the old rate, your bill is 3x higher than it should be, and the tokenizer change in 4.7 (which can use up to 35% more tokens for the same text) does not even kick in to offset it. The default model in current Claude Code releases is 4.7 unless you override.

every turn

“every single API call to Claude sends the whole context, including prompts, meaning that all this extra text in CLAUDE.md is sent over and over”

caymanjim, Hacker News thread 47581701

4. Where prompt cache changes the per-turn bill

Anthropic's prompt cache reads cost 10% of the base input rate. On Opus 4.7 that is $0.50/M instead of $5/M. The catch: the cached prefix has to be byte-identical between requests, within a 5-minute (or 1-hour) window. Line items 1, 2, 3, and 4 are usually stable enough to cache; line items 5 and 6 are fresh on every turn.

The single biggest cache killer is a volatile string in the first 20 lines of CLAUDE.md. The analyzer flags this as a cache_bust finding:

src/lib/analyzer.ts

One ISO date, one today is, one this session in the first 20 lines, and the entire prefix re-pays full input cost on every turn for the rest of the session. The bytes flagged are 30 or 40; the cost is the whole file.

5. Three levers to cut the per-turn number

Of the six line items, four are touchable. The other two (Anthropic's system prompt and Anthropic's rate sheet) are fixed. In order of impact per minute of work:

Confirm you're on Opus 4.7, not 4.1. Check your harness config or run claude --version. If the default model id is claude-opus-4-1 or earlier, switching to claude-opus-4-7 cuts input cost by 3x and output by 3x. This is the single highest-leverage change on this page.
Disable MCP servers you don't use. Each wired MCP server adds 200 to 1,500 tokens of tool definitions to line item 3. A typical user-global MCP config has 5-10 servers loaded, of which 1-2 actually get invoked in any given session. Comment out the rest in ~/.claude.json or your project config. Saves more per turn than rewriting CLAUDE.md.
Score and trim CLAUDE.md. Paste your file into ccmd.dev. Fix any cache_bust findings first (one line, 10x bill recovery). Then attack bloat, duplicate, and vague findings in order of tokenSavings. For each cut, the analyzer recommends an installable skill from skillhu.bz that replaces the lines without firing every turn.
Run /compact before line item 5 gets out of hand. Conversation history is the only line item that grows linearly without an upper bound. A 100-turn session can have 80,000 tokens of history alone. The /compact command rewrites history into a shorter summary; the next turn pays for the summary instead of the full transcript. Sometimes saves more per turn than anything else if you forgot to compact for an hour.

6. Related reads

Do CLAUDE.md rules fire on every turn? — proof from the analyzer source plus the four firing surfaces (CLAUDE.md, skills, hooks, MCP) and which kinds of rules belong where.
CLAUDE.md token waste per turn — the seven categories of bytes that fire every turn and produce zero behavioral signal, ranked by dollar impact.
CLAUDE.md token cost audit — what a per-line audit actually looks like on a real 6,000-token file.
Layered CLAUDE.md token cost — what happens when the harness walks up and concatenates three CLAUDE.md files before turn one.

Want us to score your CLAUDE.md and tell you what's wasting tokens?

15 minutes, free, we'll walk through your file line by line and rank the per-turn waste.

Frequently asked questions

What does one turn in Claude Code actually cost?

At the current Opus 4.7 input rate of $5 per million tokens, a typical Claude Code turn with a 6,000-token CLAUDE.md, three or four tool definitions, and 12,000 tokens of conversation history runs about $0.15 with no cache, or about $0.10 with a 5-minute cache hit on the stable prefix. The math is: input_tokens × input_rate / 1,000,000 + output_tokens × output_rate / 1,000,000. Opus output is $25 per million, so a 1,200-token reply adds $0.03 on its own. Sonnet 4.6 runs roughly 60% of Opus per turn at $3 input and $15 output; Haiku 4.5 runs about 20% at $1 input and $5 output.

Why does the ccmd analyzer say $15 per million when Opus 4.7 is $5?

The analyzer at src/lib/analyzer.ts line 267 hardcodes OPUS_IN_PER_M = 15, which was the Opus 4.1 input rate before Anthropic dropped Opus 4.5, 4.6, and 4.7 to $5 per million. The cost numbers in the analyzer output are still useful as a relative ranking of bytes-by-impact, but the absolute dollars are 3x too high if you're on a current Opus model. We're keeping the constant pinned to the documented rate at time of write, with a note in the source; the paid tier pulls the live rate from the pricing page.

Why is the per-turn cost not just my user message?

Because every turn sends the whole context window forward. Anthropic's Messages API is stateless: the SDK or harness rebuilds the entire conversation on every call, prepended by the system prompt (which contains your CLAUDE.md), the tool definitions, and the tool-use system prompt. Turn N pays for turns 1 through N-1 of history plus turn N's user message and the new output. The bill grows roughly linearly with turn count until a compaction event or a new session.

What are the six line items?

(1) Anthropic's hidden system prompt, around 2,800 tokens, charged at input rate, not your control. (2) Your CLAUDE.md, the global ~/.claude/CLAUDE.md, plus any nested per-directory CLAUDE.md the harness walks up to. (3) Tool definitions: schemas and descriptions for Bash, Edit, Read, Glob, Grep, plus any MCP tools you have enabled. (4) The fixed tool-use system prompt, 346 tokens when tool_choice is auto. (5) The conversation history from previous turns of this session. (6) Your new user message and any attached files or images. All six are billed at the input rate; the output is billed separately at the output rate.

Where does prompt caching reduce the per-turn cost?

Anthropic's prompt cache reads cost 10% of the base input rate, so a cache hit on a stable prefix cuts that part of the bill by 10x. The cache requires a byte-identical prefix, which means the first few thousand tokens (Anthropic system prompt, your CLAUDE.md, tool definitions) need to be unchanged across requests within the 5-minute cache window. The conversation history past the cache breakpoint is fresh-priced. The cache write costs 1.25x base input rate, so the first turn of a session pays a small premium, then turn 2 onward saves until either the cache expires or your prefix mutates. A single ISO date or 'today is' line near the top of CLAUDE.md busts the prefix.

What's a realistic per-session cost across all turns?

For a 30-turn Opus 4.7 coding session with a 6,000-token CLAUDE.md and growing history, input cost lands roughly at $4 to $7 with no cache, $1 to $2 with a hot cache on the prefix. Output is usually 5-15% of that on top. The analyzer reports the no-cache, 30-turn estimate as estimatedCostPerLongRunSession. The 30-turn assumption is in src/lib/analyzer.ts line 266 (TURNS = 30). Long-running agentic sessions on Opus run higher because of subagent fan-out: each Agent invocation starts a fresh context that re-pays the prefix unless you've explicitly set up cross-agent cache breakpoints.

Does the same per-turn math apply to Codex, Cursor, and Grok Build?

Same shape, different rates and different sources of bloat. Codex (AGENTS.md) and Cursor (.cursorrules) both inject their rule files into the host model's system prompt the same way Claude Code injects CLAUDE.md, and both fire on every turn for the session. Codex is currently routed through GPT-5 by default, so the input rate is whatever OpenAI is charging for GPT-5 at request time (not $5). Cursor uses whichever model you've selected. Grok Build (.grokrules) routes through xAI's grok-code rates. The analyzer scores all four formats with the same seven checks because the firing model is identical even when the price per token differs.

Which line item is usually the easiest to cut?

Tool definitions and CLAUDE.md, in that order. A default Claude Code session loads Bash (245 tokens), Edit (~120 tokens), Read, Glob, Grep, Write, plus whatever MCP servers you have wired up. Each MCP tool description runs 200 to 1,500 tokens depending on how verbose the server author was. Disabling one chatty MCP server you don't use saves more per turn than rewriting half of CLAUDE.md. After that, CLAUDE.md cuts target lines flagged by the analyzer as bloat (over 28 words), aspirational (always/never with no escape), duplicate, or cache_bust. Conversation history is mostly cumulative and hard to cut except via /compact.

Does Fast mode change the per-turn math?

Yes, 6x. Fast mode for Opus 4.7 is $30 per million input and $150 per million output. The same 24,568-token input + 1,200-token output turn that costs $0.153 on standard Opus 4.7 costs $0.917 on Fast mode. Cache multipliers still apply on top, so a cache hit drops the input portion by 10x but the output is still 6x. Worth it for latency-sensitive turns, expensive for long sessions.

How do I see the per-turn cost for my own file?

Paste your CLAUDE.md, AGENTS.md, .cursorrules, or .grokrules into the textarea on ccmd.dev. The analyzer reports totalTokens, estimatedTokensFireEveryTurn (always equal to totalTokens), estimatedCostPerLongRunSession, and a per-line findings array with token-savings estimates. It runs entirely in your browser, no upload. Divide estimatedCostPerLongRunSession by 30 to get the analyzer's estimate of the CLAUDE.md slice of one turn. Multiply by 1/3 if you're on Opus 4.7 (the analyzer is still pinned to the deprecated $15 rate at time of writing).