/t · guide · cost math

Agentic search cost is triangular, not linear.

M
Matthew Diakonov
7 min read

The pages that show up under this topic today fall into two camps. Anthropic's context-engineering writeup mentions cumulative cost in one sentence and moves on. LangChain and CrewAI guides treat"agentic" as a synonym for "multi-agent" and never show the formula. Cost CLIs like ccusage post-decompose your bill by model, not by tool call. None of them give you the actual math for why a long Claude Code session costs what it costs.

We can. The math is short, the formula is one line, and your CLAUDE.md is the dial that bends the curve.

1. The cost formula the analyzer uses, and what it misses

ccmd’s analyzer ships a per-turn cost estimate at lines 263-269 of src/lib/analyzer.ts. It is deliberately static: it assumes the CLAUDE.md fires every turn at full input cost across a 30-turn session. That gives you the floor. It does not model agentic search because the analyzer is a pure function with no session state.

src/lib/analyzer.ts

The line that matters in the comment is tool outputs from turn N also live in input on every turn after N. That is the part the formula skips and the part this page exists to derive.

2. The triangular sum

If turn 1 emits a 5,000-token Read, turn 2’s input includes that Read. So does turn 3. So does turn 30. The cost of that single tool call, billed across the rest of the session, is the slice size times the number of turns that follow it. Stack 30 such calls and the cumulative input is a triangular number.

cost-math.ts
0Tokens billed for 30 unbounded Reads
$0Dollar cost at Opus 4.7 input rates
0%Share of session cost that is re-billed tool output
$0Cost of the SAME 30 Reads with slice rules
$24.94 of a $27.71 session

The bill people show ccusage isn't the model. It's tool outputs they re-bought 28 times in one session.

ccmd analyzer findings, May 2026

3. What this looks like in a real session

The cumulative input column is the number that gets re-multiplied on every turn. Watch it grow.

agentic-search-cost log

By turn 30 the input window has 1.8M tokens of accumulated tool output. Each new turn re-bills the entire stack. The model decision is the cheap part. The tool-use discipline is the expensive part.

4. The four rules in CLAUDE.md that bend the curve

Each one is a single short directive. Together they take the same 30-turn session from $27.71 down to $6.10, same Opus 4.7, same answer quality. The agent only follows them if you make them load-bearing: concrete thresholds, a Why, an example.

FeatureDefault behaviorWith these rules
Read tool: bounded vs unboundedDefault full-file Read (5k-20k tokens per call)Read with offset+limit on files >500 lines (~500 tokens slice)
Search step: Grep before ReadRead the file 'to scan it' (full file enters context)Grep narrows to N matching lines, then Read those slices
Subagent for broad searchMain loop reads 8 files itself, all 8 stay in input foreverSpawn an Explore/Plan agent; main loop sees a 1-2k summary
Re-reading files you already readRead again 'to be safe' (every re-read triple-bills)Reuse what is in context unless edited (no second Read call)

A useful template:

CLAUDE.md (excerpt)

Paste your current CLAUDE.md into the textarea on ccmd.dev and the analyzer flags missing Why lines, vague tool-use language, and absolute-without-exception phrasing that the model silently ignores. The lines that survive that pass are the ones that actually change agentic search behavior at runtime.

5. What the formula does and does not capture

The triangular sum is a worst-case approximation. Three real-world adjustments push it down, sometimes by a lot.

  • Prompt caching.Anthropic and xAI both discount cached input by roughly 10x. If your prefix is byte-stable, the multiplier on already-seen tokens drops from $15/M to about $1.50/M. The triangular shape stays the same; the constant shrinks. ccmd’s cache_bust finding is the single highest-impact check for keeping that constant low.
  • Context window pressure. Long sessions eventually trigger summarization or compaction. Once the conversation is compressed, the triangular sum resets at the compacted prefix. The cost up to the compaction event is already paid, but new turns rebuild on a smaller base.
  • Subagent isolation.Tool outputs inside a subagent never enter the main loop’s context. Spawning an Explore agent for a broad codebase question converts a 30-turn triangular bill into a single 1-2k summary in the parent loop, plus the subagent’s own (smaller, isolated) triangular bill.

Want us to grade your CLAUDE.md against the agentic-search rules?

Paste it on ccmd.dev for the static analyzer, or book a call to walk through the per-turn cost on your actual session logs.

FAQ

Frequently asked questions

What is agentic search, exactly?

Multi-step search inside a single agent loop: the agent issues a tool call (Grep, Glob, Read, WebSearch), receives the output, decides the next step, calls another tool, and repeats until it has enough context to answer or edit. The Claude Code CLI, Cursor's Composer agent, and Codex are all agentic searchers. The distinction matters because each tool result joins the input window for every subsequent turn in the same conversation, so search cost is cumulative within a session, not per-call.

Why is agentic search context cost triangular instead of linear?

Because the cache key for input tokens is byte-for-byte. A 5,000-token Read at turn 1 enters the input window and stays there for turns 2 through T (until the conversation ends or summarization fires). At Opus 4.7 input pricing of $15 per million tokens, that single Read is paid for 30 times across a 30-turn session, not once. The cumulative tokens for T turns each adding one fresh tool output of size S is S * T * (T+1) / 2, the triangular number times S. For T=30 and S=5,000 that is 2.3 million tokens, $34.88.

Does prompt caching erase this cost?

Cached input is 90% cheaper at Opus 4.7 rates, so yes, caching reduces the multiplier from 30 to roughly 3 once cache hits land. But cache hits require a byte-identical prefix from the previous request. Any volatile text (a dynamic date, a tool output that mutates between turns, a session-specific identifier) breaks the cache for everything that comes after it. The ccmd analyzer's cache_bust check at analyzer.ts:194 catches the worst offender (ISO date in the first 20 lines of CLAUDE.md). It does not catch every cache break inside a long agentic search session, that is on the agent's tool-use discipline.

How do I cut my agentic search context cost in practice?

Four moves, in order of impact. (1) Read large files in slices with offset+limit. A 400-line Read pulled as a single chunk is ~5k tokens; the 20-line slice you actually need is 250. (2) Grep before Read. A Grep call returns matching lines, not whole files. Read only the slices the grep surfaced. (3) Use a subagent for broad search. The main loop sees a 1-2k summary; the subagent's internal context never lands in your main session. (4) Stop re-reading files you already read this session unless they were edited. Each of these is a one-line rule in CLAUDE.md, but the agent only follows them if you make them load-bearing (concrete thresholds, a Why, an example).

Does ccmd's analyzer actually model agentic search cost?

The static formula at src/lib/analyzer.ts:263-269 models CLAUDE.md cost (totalTokens * 30 turns * $15/M Opus 4.7). It does not model the cumulative re-billing of tool outputs because the analyzer is a pure function with no session state. What it does is grade your CLAUDE.md on the four rules above: it flags missing Grep-before-Read patterns, missing slice-read directives, and vague tool-use language that the agent silently ignores. The bend in the agentic search cost curve happens upstream of the analyzer, in your config file.

Why does no other tool show this math?

Cost CLIs like ccusage and claude-meter report the bill after the fact and decompose by model, not by tool. They will tell you that this session cost $27.71; they will not tell you that $24.94 of it came from re-billed Read outputs. Anthropic's own context-engineering post mentions cumulative cost in one sentence but does not derive the triangular formula. LangChain and CrewAI explainers treat 'agentic' as a synonym for multi-agent, which is a different cost shape. ccmd is the only analyzer that points at the CLAUDE.md lines that control the per-turn search behavior, and this guide is the math behind that.

Does this also apply to AGENTS.md and .cursorrules?

Yes. The same triangular cost applies to any agent loop where tool outputs accumulate in the input window across turns. Codex (AGENTS.md), Cursor's Composer (.cursorrules), and xAI Grok Build (.grokrules) all use the same tool-output-stays-in-input model. The ccmd analyzer detects type by content and runs the same rubric against all four file formats.

Is this an argument against using agentic search?

No. Agentic search is the reason these tools are useful at all. The argument is that the cost shape is non-obvious and that a small amount of tool-use discipline in your CLAUDE.md (or AGENTS.md, or .cursorrules) gives you 4-6x cost savings without changing how you work. The rules you want are short and specific. Paste your config into ccmd.dev and you get the same checks the analyzer would flag at code review.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.