/t · guide · drift

Your CLAUDE.md gets more expensive every week. Nobody is doing it on purpose.

Matthew Diakonov, Written with AI

Published May 18, 20267 min read

Every guide on this topic shows you what your CLAUDE.md costs in one moment. That number is a y-coordinate. The bill is a slope. Two files at 6,000 tokens today look identical to an audit; one was 5,800 last week and one was 2,100. The first is stable, the second is on track to be 12,000 by July. The audit cannot tell you which is which.

This is what we mean by drift. It is the only reason a senior engineer's file looks fine in week one and unaffordable in week eight. None of the commits in between were wrong.

1. The analyzer reports one instant. Drift hides between instants.

The ccmd analyzer is a pure function. You paste a CLAUDE.md, it returns one snapshot. The two lines that produce the cost number are this:

src/lib/analyzer.ts

Three things to notice. First, estimatedTokensFireEveryTurn = totalTokens. The analyzer assumes the entire file fires on every turn, which is what Claude Code actually does with project-level CLAUDE.md. Second, token estimation is Math.ceil(text.length / 4), the same chars-divided-by-four heuristic Anthropic's own tokenizer-counting CLI examples use. Third, the cost is per long-running session of 30 turns at Opus 4.7 input rates.

That is all you need to measure drift. Run the same function on two CLAUDE.mds from two SHAs and the difference is your weekly delta. The analyzer never does that because it does not have your git history; you do.

2. Eight weeks, six commits, 3x cost

This is the actual shape we keep seeing in repos. Six commits across eight weeks that nobody would block in review. The number on the right is the per-turn token floor at that moment.

Week 0: starter file, 2,140 tokens

A clean CLAUDE.md a senior engineer wrote on day one. Names the stack, sets test discipline, forbids three things by name, leaves room.

Per-turn input floor at Opus 4.7: 2,140 tokens. A 30-turn session costs about $0.96 in CLAUDE.md rehydration alone.

Week 1: production incident, +180 tokens

We got burned by a destructive migration that ran in prod. A new 'Never run irreversible SQL without ...' rule lands in CLAUDE.md the next day.

The rule is correct and load-bearing. Nobody flags it. Per-turn floor is now 2,320 tokens; no audit fires because the file is still small.

Week 3: 'small' style PR, +420 tokens

A teammate pastes the Karpathy 12-rule reminder into a ## Process section. Half of it is already covered by other rules higher up.

Per-turn floor crosses 2,740. The duplicate finding would catch it, but nobody runs the audit on the PR. The PR is one line in CLAUDE.md and four lines of code; the reviewer reads the code.

Week 5: new framework version, +1,300 tokens

Next.js ships a breaking change. We append a long ## Next.js notes section to CLAUDE.md instead of moving it into a scoped ./web/CLAUDE.md.

Per-turn floor is now 4,040. Cost per 30-turn session has crossed $1.80, almost double week 0, and the file still 'looks fine' to a reader.

Week 6: someone adds a date, the cache breaks

A consultant writes 'Today is 2026-05-04, mid-sprint on subscriptions' into row 3 so Claude has temporal context. The analyzer flags it as cache_bust at high severity.

Per-turn floor at 4,200, but the byte-exact cached prefix now resets every single session. The weekly bill jumps about 10x relative to the cached path.

Week 8: 6,420 tokens, 3x cost

31 commits to CLAUDE.md, 312 lines inserted, 7 deleted. Per-turn floor is 6,420 tokens; a 30-turn Opus 4.7 session is $2.89 in pure CLAUDE.md.

No single commit was wrong. Each landed for a reason. Together they tripled the cost of every single Claude Code turn this team runs.

Run the audit in week 0 and the file is fine. Run it in week 8 and the file is bad. Run it in any week in between and the finding count climbs by one or two. None of the snapshots look alarming. The trajectory does.

3. The shape of drift, on your repo, in two commands

Save this to scripts/claudemd-drift.sh and run it from your repo root. It is the entire drift detector, written from the ccmd analyzer's own constants. No network calls, no signup.

scripts/claudemd-drift.sh

Here is what it looks like on a real repo we audited last week:

claudemd-drift.sh on a real eight-week-old CLAUDE.md

312 inserted lines, 7 deleted. That 44-to-1 ratio is the fingerprint of drift. A file that grows the way code grows (delete-rewrite-refactor) does not look like that. A file that grows the way a log file grows does. CLAUDE.md is a log file unless somebody is paid to compact it.

4. Snapshot vs slope, side by side

Most existing playbooks on this topic stop at the snapshot. The snapshot finds the one bad line, sometimes finds the duplicate, and is silent about the trajectory. Here is what falls into each column:

Feature	Audit (snapshot)	Drift (slope)
What it answers	What the file costs right now, in this paste	How fast the file is growing per week, and what next session's added floor will be
Inputs	One CLAUDE.md state	Two CLAUDE.md states from different SHAs
Catches add-only commit habits	No	Yes
Catches a single bad line that doubles cost	Yes	Yes
Catches a slow 200-tokens-a-week creep	No	Yes
Tells you what to delete today	Yes	Yes
Surfaces during PR review automatically	No	Paid tier (PR diff comment)

Both columns are real and both have a place. The snapshot tells you what to delete today. The slope tells you whether the file will be back at this size in six weeks regardless of what you delete.

5. The one-way function: why almost nobody deletes a CLAUDE.md line

The structural reason drift is monotonic upward is human, not technical. When something breaks, the cost of adding a rule is felt (we got burned). The benefit is immediate and named (the rule will stop this from happening again). When a rule becomes obsolete, neither the cost nor the benefit of deletion is felt: nothing ever notifies you that a rule is stale, no test fails when a rule is duplicated, no PR is blocked when a section becomes irrelevant. So the add path is high-signal and the delete path is silent. Over weeks, that asymmetry is sufficient to explain the slope.

The fix is to make the delete path loud. The ccmd analyzer already does this on the snapshot side: the duplicate finding flags a rule that already exists earlier in the file, the vague finding flags adjectives like "appropriate" that have no testable meaning, the bloat finding flags any line over 28 words. Those are deletion prompts. The paid drift monitor sends them weekly so the prompt arrives on the day the rule landed, not eight weeks later.

6. What to actually do this afternoon

Paste your current CLAUDE.md into ccmd.dev. Record the totalTokens number. That is your y-coordinate today.
Run git log --since='8 weeks ago' --shortstat -- CLAUDE.md. Sum the insertions and deletions. If insertions are more than 10x deletions, drift is happening at structural rate.
Use the eight-week-old SHA from git rev-list to recover the previous CLAUDE.md, paste it into the analyzer, and record that totalTokens. Subtract. Divide by eight. That is your tokens-per-week growth rate.
Set a per-PR token budget for CLAUDE.md. We use 4,000 tokens on most projects. Any PR that pushes the file over needs to name the existing rule it replaces or move the new content into a scoped sub-directory CLAUDE.md (see the multi-file token budget guide) or a Claude Skill that loads on demand.
If you have a single line near the top that contains a date, the word "today", "this session", or "right now", delete it first. That is the cache_bust finding and it multiplies the cost of every drift point by roughly 10x via the prompt cache. See the related weekly quota burn walkthrough.

Want the drift number on your actual CLAUDE.md, before next week?

15 minutes. We walk your repo, run the analyzer on the current file and the eight-week-old file, and you leave with a tokens-per-week growth rate and the three lines producing most of it. Free.

Frequently asked questions

What is CLAUDE.md token budget drift, in one sentence?

It is the silent monotonic upward creep of a CLAUDE.md's per-turn token cost over time. The free analyzer at ccmd.dev gives you the instant value (totalTokens, see src/lib/analyzer.ts line 139). Drift is the delta between two analyzer runs across commits, multiplied by your turns-per-session and the Opus 4.7 input rate of $15 per million tokens. Verified against the analyzer source on 2026-05-18.

Why does drift happen even when each individual commit is reasonable?

CLAUDE.md is a write-mostly artifact. People add a rule when something breaks; people almost never delete a rule because nothing ever 'tells them' the rule is now obsolete or duplicated. In real repos we counted, the insert-to-delete ratio on CLAUDE.md across eight weeks was 44 to 1. The file grows the way log files grow, not the way code grows. There is no compactor running, so the drift is a one-way function: tokens added stay added.

How does drift differ from the per-session token cost a normal audit shows?

A per-session audit reports one number for one snapshot. Drift is the slope between snapshots. The audit answers 'what does this file cost right now', drift answers 'how fast is what-it-costs growing'. Two CLAUDE.mds with the same totalTokens today can have very different futures: one is a fresh greenfield file, the other is the surviving end-state of six weeks of add-only commits and will keep growing at the same rate next week. The analyzer's totalTokens is a y-coordinate; drift is dy/dt.

What is the single git command that turns the snapshot into a slope?

git log --since='8 weeks ago' --shortstat -- CLAUDE.md, then sum the insertion and deletion columns. If insertions are an order of magnitude larger than deletions, you have drift. There is no fancier tool needed than wc -c on git show $SHA:CLAUDE.md for two SHAs and dividing the delta by four to get a token estimate. That is the exact estimator the ccmd analyzer uses at src/lib/analyzer.ts line 38.

Does a cache_bust line make drift worse or just shift it?

It changes the bill from per-session to per-week dramatically. A static 6,000-token CLAUDE.md hits the prompt cache after the first turn of a session, so the per-week input cost is roughly one rehydration per session times 25 sessions. A 6,000-token CLAUDE.md with a date string in the first 20 lines busts the cache on every turn because the cached prefix has to match byte-for-byte. That moves the weekly cost up about 10x. The drift is the same; the bill the drift produces is much larger. See the cache_bust finding at src/lib/analyzer.ts line 194.

What does the paid tier of ccmd add for drift specifically?

Continuous monitoring: a weekly diff email that says 'CLAUDE.md grew 420 tokens this week, three of those rules duplicate older ones, here are the lines'. A PR diff comment that runs the analyzer on the new CLAUDE.md and the base CLAUDE.md and surfaces the token-delta in the PR before merge. Per-engineer cost attribution for teams sharing one CLAUDE.md. None of that is in the free homepage analyzer, which is a one-shot. The drift detection is exactly the same math as the homepage; the difference is whether you run it every commit or once when the file already feels heavy.

Does drift apply to AGENTS.md, .cursorrules, and .grokrules the same way?

Yes. The analyzer detects all four formats (src/lib/analyzer.ts line 41) and runs the same totalTokens estimate against each, so the slope-vs-snapshot framing works identically. The bill the slope produces differs by platform: Codex bills per-token with a daily soft cap, Cursor bills per-request, Grok Build bills per-token. The add-only commit habit is platform-independent; we have not seen a config-file format where humans systematically delete rules in proportion to how often they add them.

If drift is one-way, what is the actual remediation?

Three things. First, rehome by scope: move sections that only apply to ./api or ./web into ./api/CLAUDE.md and ./web/CLAUDE.md so they fire only when those paths are touched. Second, move runbook-style content out of CLAUDE.md and into a Claude Skill that loads on demand. Third, set a per-PR token budget for CLAUDE.md (we use 4,000 tokens) and reject PRs that grow the file past it without justifying which existing rule the new one replaces. The first two reduce the current floor; the third stops drift from re-accumulating.