/t · guide · cost mechanics

Claude Code context cost scales quadratically, and the analyzer line that hides it

M
Matthew Diakonov
8 min read

Most cost guides for Claude Code stop at one of two punchlines: "Opus is more expensive than Haiku" or "context costs more tokens." Both true. Neither tells you why the bill on a Wednesday-afternoon session that started cheap and felt cheap is the one that hits the weekly cap. The mechanic everyone glosses over is the shape of the cost curve across a single session. It is not flat. It is not linear. Until prompt caching is doing its job, it is closer to a parabola.

This page walks the math, names the analyzer line that hides it, and shows what changes once caching is on.

0xper-turn cost growth, turn 1 baseline
0xper-turn cost at turn 200, no caching
0xsession multiplier vs. single turn at N=200
0xcache-read rate vs. fresh input

Every turn pays for every prior turn

The model is stateless. Each request is one self-contained input. Claude Code rebuilds that input on every turn by concatenating: the system prompt, your CLAUDE.md and its imports, the tool schemas, the full conversation so far, and your new message. Then it ships the whole thing.

So turn 1's input is small. Turn 2's input contains turn 1. Turn 200's input contains turns 1 through 199. Summed across the session, ignoring the static prefix for a moment, the conversation-history piece alone is the sum of an arithmetic series:

cost math, plain

That last line is the punchline. The × 30 you see in cost calculators is the value of N. The number that actually drives the bill is N(N+1)/2. The two are equal at N = 1 and diverge from there.

A 200-turn Sonnet session, turn by turn

Pick numbers that look like a real coding session. A 6,000-token CLAUDE.md (typical for a project that has been around a few months). Roughly 600 tokens of fresh content added per turn (one user message plus one assistant reply plus a tool round-trip). Sonnet 4.6 at $3 per million input, where the 1M window makes a 200-turn run plausible. No caching, no /clear. The cumulative bill:

ccmd cost --simulate --turns=200 --model=sonnet-4.6 --no-cache

Turn 1 costs two cents. Turn 200 costs nearly forty. The cumulative isn't 200 × turn 1, it's closer to 1,900 × turn 1. The mismatch is the quadratic part. This is the shape that turns "I'll just keep going" into "wait, I hit the cap."

The analyzer line that hides the curve

ccmd's own analyzer has a cost estimate at the bottom of its score. It looks reasonable. It is a useful per-CLAUDE.md benchmark for comparing two files. It is not a session forecast, even though the variable name says estimatedCostPerLongRunSession. Here is the line:

src/lib/analyzer.ts

Two things to notice. First, the rate constant is $15 / M, which was Opus 4.1's input price. Opus 4.7 input is $5 / M as of 2026, so the analyzer's dollar figure is about 3x high on the most common default. Second, and more importantly, the math is linear: tokens × TURNS where TURNS is a constant 30. That treats every turn as identical in size to turn 1. It is the right shape for "what does my CLAUDE.md cost on a single turn," multiplied for legibility. It is the wrong shape for "what will this session cost me by the time I /compact."

Side by side, here is the gap:

The analyzer's number vs. what the bill does

// what the analyzer prints today
// (src/lib/analyzer.ts:263–269)

tokens     = 6,000
TURNS      = 30          // constant
rate       = $15 / M     // analyzer's number
cost       = tokens × TURNS × rate / 1M
           = 6000 × 30 × 15 / 1_000_000
           = $2.70

// reads like: "30 turns ≈ a coffee."
-27% fewer lines

The analyzer's $2.70 and the simulated $37.62 are measuring different things. One is the CLAUDE.md's per-turn weight × an arbitrary multiplier. The other is the summed bill across a real session that uses the 1M window you're paying for. Both are correct for what they claim. Most pages quote the first and let you assume the second.

Prompt caching bends the curve, doesn't flatten it

Anthropic's prompt cache reads cost 0.1x of the base input rate on prefix tokens that match a prior turn within the cache TTL. Claude Code has caching on by default. So the static prefix (system prompt, CLAUDE.md, tool schemas) gets paid for at a small premium on turn 1 and then ridden cheaply on every subsequent turn. That collapses the dominant term of the quadratic, the one that grows with accumulated prefix-like content. The same simulated session with caching working:

ccmd cost --simulate --turns=200 --model=sonnet-4.6 --cache=stable

$37.62 becomes ~$11 if the cache stays warm for the whole session. That is the real win, and it is what most engineers are actually paying, because Claude Code's default behavior is caching-on. The catch is that the cache is fragile in a specific way: it breaks on any change to the prefix. A line like Today is 2026-05-20 at the top of CLAUDE.md re-writes the prefix every day. The conversation tail is still fresh on every turn, so even with caching the bill grows linearly in N. The shape changes from parabola to line, not to flat.

0.1x

Cache reads are billed at one-tenth of the base input rate on prefix matches within the TTL window. Cache writes cost roughly 1.25x of base on the first send. This is the lever that decides whether your long session costs $11 or $40.

Anthropic, Prompt caching docs

What 1M context does to the curve

Opus 4.6 and Sonnet 4.6 went GA at a 1M-token context window with no long-context surcharge. "Standard pricing" is the phrase Anthropic uses. That phrase moves a lot of mental weight off of "I should cut context" and onto "I can keep going." The cost shape does not agree.

1M is a ceiling on the window, not on the session. The quadratic curve is the same. What changes is that the curve can run for longer before auto-compaction caps it. Pre-1M, you'd hit compaction at around turn 60-80 on a 200K window. Post-1M, you can ride to turn 200 or beyond. The bill is whatever the integral is from turn 1 to whichever turn you stop at. Bigger window, larger integral, same shape.

The practical read: 1M makes context engineering more important, not less. The window where you could be lazy and rely on auto-compaction to clean up after you used to end at 200K. Now it's 1M. The cost of being lazy scaled with the window.

What to actually change

Three habits in order of impact:

  1. Stabilize the prefix. Pull every date stamp, "last updated"marker, and "we are mid-sprint" line out of the top of CLAUDE.md. Move mutable info to the bottom or out of the file. The cache hit on a stable prefix is the difference between $11 and $35+ on a long session. ccmd flags these with a cache_bust finding.
  2. /compact early, not at 95%. Auto-compaction fires near full. By then the late-curve turns have already been paid. /compact manually around 40-50% context use, or set a habit of doing it whenever you finish a sub-task and start the next one. Caps the worst case at a fraction of what the integral would otherwise be.
  3. /clear on topic change, not on token count. /clear kills the tail entirely. That's a win when the next thing you do is unrelated to the last thing. It's a loss if you were going to keep working on the same task and re-pay cache writes. The decision rule is topic, not token count.

Three things that don't help as much as they sound like they should: switching to Haiku (caps the ceiling at 5x cheaper but the curve shape doesn't change), buying a bigger plan (raises the cap, doesn't bend the curve), and adding more parallel agents (multiplies the per-turn baseline; see CLAUDE.md cost with parallel agents).

The closest sibling pages on this site: per-turn token cost breaks down the six line items on a single turn, and cost is context, not model argues the same thesis from the model-pricing side.

Want us to look at your actual session bill?

Paste your CLAUDE.md, hand us a /usage screenshot, we'll show you which turns the curve is steepest on and where the prefix is breaking.

Frequently asked questions

Is Claude Code cost actually quadratic, or am I missing something?

The raw input cost is quadratic in turns. Each turn re-sends every prior turn, so turn N includes turns 1 through N-1 plus a fresh tail. Summed across the session, that's payload_per_turn × N(N+1)/2, which is the closed form of 1+2+...+N. At N=30 the multiplier is 465. At N=200 it's 20,100. Prompt caching bends this back toward linear because the prefix gets billed at 0.1x of the input rate. But it never flattens completely: the conversation tail is fresh on every turn and the cache write at session start still pays full price plus a small premium.

Doesn't /compact and auto-compaction make this a non-issue?

It caps the worst case, not the curve underneath it. Claude Code auto-compacts at around 95% of the window. /compact triggers manually. After compaction, the history gets summarized into a smaller block and the context starts again. That resets the quadratic curve, but if you wait until 95% you've already paid the late-curve turns at full price. The cheaper move is to /compact earlier, around 40-50%, or to /clear between unrelated tasks so the prefix doesn't drag along baggage from topic A while you work on topic B.

If Sonnet 4.6 ships with 1M context at standard pricing, why does context cost more?

Standard pricing means there's no long-context surcharge per token. It does not mean a 200-turn session costs the same as a 20-turn session. The token count itself still grows turn by turn. 1M makes the curve possible to run for longer before compaction, which is good if you're using it deliberately and bad if you're treating the larger window as 'now I can ignore session length.' The lever shifted; the cost shape didn't.

Where exactly does ccmd's analyzer get this wrong?

The cost line at src/lib/analyzer.ts:263-269 multiplies your CLAUDE.md token count by a flat TURNS = 30 and a hardcoded $15 per million. Two things drift: the rate (Opus 4.7 input is $5/M now, not $15, so the dollar figure is 3x high) and the assumption that every turn costs the same as turn 1 (which understates long sessions by an order of magnitude). The analyzer's number is a useful per-turn ceiling for the CLAUDE.md piece. Read it as 'this much fires every turn on top of conversation history,' not 'this much for the whole session.'

How does prompt caching change the math?

Anthropic prices cache reads at 0.1x of the base input rate (a 90% discount) on prefix tokens that match a prior turn within the TTL window. Cache writes cost roughly 1.25x of base on the first send. So a stable prefix gets paid for once at a small premium and then ridden for free-ish on every subsequent turn. The fresh tail (your new message plus the model's response from the prior turn) is full price. Net effect: the linear growth in tail is still there, but the quadratic part collapses onto a near-flat baseline. The 200-turn session that costs $37.62 raw lands around $11-13 with caching working, and around $34 if the prefix is unstable.

What makes the prefix unstable and how do I keep it cacheable?

Any line that changes between sessions breaks the cache and forces a re-write. Common culprits in CLAUDE.md: a date stamp at the top ('Today is 2026-05-20'), a 'last updated' marker, a session-specific 'we are mid-sprint on X' line. Also: shuffling rule order, adding skills via reorder rather than append, and editing files Anthropic's prompt assembler reads above your CLAUDE.md. Keep mutable info near the bottom or out entirely. ccmd's analyzer flags timestamps and session-specific text in the first 20 lines as a `cache_bust` finding for exactly this reason.

Does /clear save money?

It depends on what you do next. /clear drops the conversation history, which kills the quadratic tail. That's a win if you were about to start an unrelated task. It's a loss if you were going to keep working on the same task, because you'll re-pay the cache writes when you reload context. The decision rule is topic, not token count: when the next thing you do is unrelated to the last thing you did, /clear. When it's a continuation, ride the cache.

Why does the analyzer use 30 turns at all if it's wrong?

Thirty turns is a reasonable default for a single-focus working session before /compact would normally fire. The analyzer's job is to give you a comparable, file-by-file score for the CLAUDE.md you paste in, not to predict your exact bill. The number it prints is best read as 'per-turn cost × a session of typical length,' useful for ranking files against each other. The mistake is to read it as 'this is what a 200-turn 1M-context run will cost,' which it never claims to be.

What's the one change tonight that bends the curve?

Two things, in this order. First, remove every date stamp and session-specific line from the top 20 lines of CLAUDE.md so the prefix is stable enough to cache for the whole session. That alone is roughly a 10x cost difference on a long run. Second, set a habit of /compact around 40-50% context use instead of letting auto-compaction fire at 95%. That caps the worst-case quadratic tail before it gets expensive. Then paste your CLAUDE.md into ccmd.dev to see what else fires every turn and could just leave.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.