/t · guide · audit

Audit the silent token cost of your CLAUDE.md.

Matthew Diakonov, Written with AI

Published May 21, 20265 min read

1. The bill is the wrong place to look

You opened your provider's usage page because the weekly limit slapped you on a Wednesday. Total tokens, daily breakdown, model split. Helpful for capacity planning, useless for the question you actually have: which file in my repo cost me that?

The usage page does not know your CLAUDE.md exists. ccusage (14.3K stars on GitHub, the most popular community cost CLI) reads the same total. claude-meter shows the running tally. None of them attribute a single token back to a specific line. That is what we mean by silent. The cost is real, the cause is specific, the surface you are looking at does not show the link.

A CLAUDE.md gets re-sent in full on every turn. A 6,000-token file across a 30-turn session is 180,000 input tokens, every session, whether you touched the file that day or not. The audit is the tool that breaks that number down per line.

2. The four silent surcharges

ccmd's analyzer surfaces seven finding kinds in total. Four of them are cost findings: bytes you are paying for that produce no proportional behavior. The other three (aspirational, conflict, missing_why behaviorally) are behavior findings; covered in the sibling silent failure audit.

Feature	what it costs	surcharge kind
cache_bust	+90% cost per session (10x cache vs cold)	ISO date or "today" in first 20 lines
duplicate	tokens paid twice, one signal delivered	same rule on two lines
bloat	second half ignored, full token cost	rule line over 28 words
dead-rule	agent ignores or guesses; line still ships every turn	vague, aspirational, or missing_why

The columns: left is what the detector matches on, right is what the surcharge actually does to your token bill. None of these appear in any log on any usage page.

3. cache_bust: the 10x silent surcharge

One finding kind is marked severity: "high" in the analyzer. It is the only one. The detector lives at line 194 of src/lib/analyzer.ts:

src/lib/analyzer.ts (line 194)

The regex matches an ISO date, the word "today", the phrase "this session", or "right now" in the first 20 lines. Why those four shapes? Because they are the shapes a senior engineer drops into a CLAUDE.md without thinking, assuming the agent needs orientation: Today is 2026-05-21 at the top, or We are mid-sprint on the subscriptions rewrite in the lede.

Anthropic's prompt cache returns roughly a 10x discount on cached input tokens (read tokens vs cold tokens, look it up in the model pricing page). A cache hit needs a byte-identical prefix. One line at the top of your file that mutates session to session breaks the prefix and forces a full cold read on every turn. The math is the same file, two bills, an order of magnitude apart. There is no warning on the second bill that says "your cache hit rate dropped to zero today."

4. duplicate: paying twice for one signal

The duplicate detector at line 207 normalises each line (trim, lowercase) and flags any line that matches an earlier one of at least 10 characters. The usual cause is paste from a sibling project's CLAUDE.md, or one rule that crept into both an ## Always block and a ## Style block as the file grew.

The token cost of a duplicate is exact: every byte of the duplicate ships every turn, costs full input rate, conveys no new signal. The analyzer attributes the per-line bytes to tokenSavings, so a one-line duplicate that's 14 tokens long shows up as 14 tokens per turn, or 420 tokens across a 30-turn session, recoverable by deleting the second copy.

5. bloat: tokens shipped, words ignored

The bloat detector at line 150 flags any rule line over 28 words. The threshold is empirical: rule lines past roughly 25 words consistently get treated as one signal by the model, and the back half tends to be ignored. Sentences like "we deploy to Vercel from main with preview deploys on every PR which means we have to be careful about migrations and we use a separate staging DB for that purpose so do not run destructive migrations without confirming first" fire the detector. You pay for every word, the agent reads the first half.

The analyzer estimates recoverable bytes at 35 percent of the line's token count, conservatively. Split the line into 2 or 3 short directives, each one actionable, and you recover the bytes and get more reliable behavior. Two silent surcharges cancelled by one edit.

6. dead-rule waste: bytes that buy no behavior

The vague detector at line 163 holds 14 untestable terms: appropriate, properly, carefully, as needed, where applicable, when possible, and nine more. A line that says "handle edge cases appropriately" ships its tokens every turn and produces no testable behavior change. The agent did something. You cannot tell whether it followed the rule.

missing_why is the same cost pattern from a different shape: a NEVER or DO NOTwith no "because" in the next four lines. The agent follows until an edge case shows up, then guesses, and the rule was load-bearing on the case it was guessing about. Tokens spent, guardrail not delivered.

Neither of these has a per-line dollar cost the way cache_bust and duplicate do, but they add up. A 40-line ## Always block where half the rules are vague is half the tokens of that block, every turn, that buy nothing.

7. The audit, in one paste

Paste your file into the textarea on ccmd.dev. The detector walks the bytes, returns a line-numbered list of findings, attributes per-line savings, and prints the three cost states (cold, cached, cache-busted). The numbers come from the constants at lines 266 to 267, here verbatim from the file:

src/lib/analyzer.ts (lines 263-269)

A representative output on a 6,000-token file with one dated line near the top, one duplicate rule, one bloat line, and one missing-why:

ccmd audit (silent-cost pass)

The four red lines at the bottom are the surcharges. None of them show up on the bill. All of them ship every turn.

8. What the audit does not catch (yet)

Subagent inheritance. A subagent launched from your session loads its own context; the parent CLAUDE.md does not always carry. The cost shows up multiplied across subagents in long-running pipelines. See /t/subagent-claude-md-inheritance.
Layered files. Project CLAUDE.md plus user CLAUDE.md plus an AGENTS.md plus a nested CLAUDE.md in a subdir all ship together on a given turn. Per-file cost without per-file attribution is the same silent problem one layer up. See /t/layered-claude-md-token-cost.
settings.json contradictions. A "never run destructive shell commands" rule plus a permissive Bash allowlist in .claude/settings.json is the silent rule that ships and silently does not work. Not a cost surcharge directly; a behavioral surcharge. On the analyzer roadmap.

Want us to run the audit with you?

20 minutes, your file on screen, every silent surcharge called out by line number. Free.

Frequently asked questions

Why is CLAUDE.md token cost called silent?

Because every usage surface in the stack stops one layer short of attribution. Anthropic's usage page shows one number per day. ccusage shows total spend across sessions. claude-meter shows the running tally. None of them know that 38 percent of today's input cost was the same 240 lines of CLAUDE.md re-sent on every turn, or that one dated line near the top of that file flipped the prompt cache off and 10x'd the per-turn read. The cost was real, the line that caused it was specific, the bill said neither.

What does ccmd's analyzer actually surface that the bill does not?

Four surcharges, all per line, all from one file at src/lib/analyzer.ts. cache_bust (line 194, severity high) flags any ISO date, 'today', or 'this session' in the first 20 lines that voids the prompt cache. duplicate (line 207) flags the same rule on two lines. bloat (line 150) flags rule lines over 28 words. missing_why (line 227) plus vague (line 163) flag rules the agent will treat as dead, so the tokens ship but produce no behavior. Each finding includes a tokenSavings estimate so you can sum the recoverable bytes.

What is the math the analyzer uses to estimate session cost?

Three constants at src/lib/analyzer.ts lines 266 to 267: TURNS = 30, OPUS_IN_PER_M = 15. The formula is (totalTokens * TURNS * OPUS_IN_PER_M) / 1_000_000. A 6,000-token CLAUDE.md works out to about $2.70 across a 30-turn session at the cache-busted rate, or roughly a tenth of that with a clean cache. Your real number is in your usage page; plug your file's token count into the same formula and compare.

Why is cache_bust the highest-severity finding in the audit?

Because it is the only single-line edit that changes cost by an order of magnitude. Anthropic's prompt cache returns a ~10x discount on cached input tokens; a CLAUDE.md whose first 20 lines do not mutate session-to-session hits the cache on turn two onward. One line like 'Today is 2026-05-21' near the top breaks byte-identical prefix matching and forces a full input read on every turn. Move the dated line to the bottom of the file, or strip it, and the whole file recovers the cache discount.

What about bloat lines, why are those a silent surcharge?

A rule line over 28 words ships every byte of the line to Claude on every turn, but the model treats the second half as filler. You pay tokens for both halves and get behavior from one half. The analyzer estimates recoverable bytes at roughly 35 percent of the line's token count, which is conservative. Split each long line into 2 to 3 short directives, each one actionable, and the recoverable bytes drop to your bill.

Does the analyzer upload my CLAUDE.md?

No. The analyzer is pure client-side TypeScript in src/lib/analyzer.ts, 322 lines, no network calls. Open DevTools, watch the Network tab, paste a file: there is no POST. The free tier of ccmd is one-shot, in-browser. The paid tier ($9 to $19 a month solo, $49 a team) adds continuous monitoring, weekly drift email, and PR diff comments, which do upload the file you opt to monitor.

Does the audit work on AGENTS.md, .cursorrules, .grokrules?

Yes. Detection is by content shape, not filename. The detectType function at src/lib/analyzer.ts line 41 looks at the first 300 characters for AGENTS.md headers, 'You are an' style prompts (cursorrules), or grok markers. The same four silent surcharges fire across all four formats, because the cost model is the same: the whole file ships to the agent on every turn regardless of which orchestrator is running it.

How long does the audit take?

The detector runs in about 220 ms on a 200-line file in the browser. The whole audit (paste, read, fix the high-severity finding, re-paste, confirm) takes roughly 4 minutes. The cache_bust fix is one line moved; duplicate is one line deleted; bloat is one line split. None of it requires a build, a deploy, or a re-run of your agent.