/t · guide · reference

The CLAUDE.md instruction token threshold is 28 words, not 2,500.

M
Matthew Diakonov
6 min read

1. Most guides quote the wrong ceiling

Search any variant of this question and the answer comes back the same: keep CLAUDE.md under 200 lines, or under 2,500 tokens, or under some other file-level cap. That advice is fine. It is also the wrong number to be quoting first, because a CLAUDE.md can pass the file ceiling and still fail every individual rule inside it.

The failure mode looks like this. The file is 1,500 tokens, well under budget. Five of the rules are 35 to 40 words each. The model reads the file every turn (that part the file-level guidance is right about), but for each of those five rules, the back half of the sentence falls off the attention curve. The rule appears to fire because the first half lands. Then an edge case shows up, the part that handles the edge case lives in the second half, and the agent does the thing the rule was meant to prevent.

You audit the file. The token count is fine. The rule is in the file. You assume the rule is working. The only signal that it is not is the regression you ship.

28 / 25

Trigger fires above 28 words. The message tells you to aim under 25. 25 to 28 is the degraded zone.

src/lib/analyzer.ts line 150

2. The exact code path

Here is the per-line check, copied unmodified from the analyzer that runs in your browser when you paste a file on ccmd.dev. No upload, no signup; the function is pure TypeScript and the file path is verifiable in the repo.

src/lib/analyzer.ts

Three numbers in that block do load-bearing work and are worth calling out individually.

  • 28: the strict word-count trigger. Lines with 28 words or fewer pass; 29 or more fire the finding.
  • 25: the human-readable target in the message. This is what you set your editor soft-wrap at if you want zero findings without thinking about it.
  • 0.35: the fraction of the line's tokens credited as recoverable savings when the rule gets split. Floor of est-tokens times 0.35. That is the empirical drop from rewriting a run-on as 2 to 3 short bullets.

3. Three thresholds, not one

CLAUDE.md has three different token thresholds the analyzer enforces. Conflating them is the single most common mistake we see.

ThresholdTriggerWhat goes wrong above it
Per instructionwordCount > 28Back half of the rule stops shaping behavior. The file passes but the rule does not fire reliably.
Per file~2,500 tokensEvery turn pays full input cost on the whole file. Rate-limit hits accelerate; per-engineer billing variance climbs.
Position in fileidx < 20Any ISO date or "today" in the first 20 lines busts the prompt cache on every session. Fresh input cost on every turn, regardless of file size.

The first is the threshold this page is about. The second is the one every other guide quotes. The third is the highest-leverage fix in the audit (one line move, full cache hit rate restored). All three live in the same analyzeConfig function and run on the same paste-and-score.

4. A worked example

Here is a single rule from a real CLAUDE.md that crossed the per-instruction threshold. The first version is one bullet, 47 words, one signal as far as the model is concerned. The fixed version is four bullets, each under 12 words, four signals. The token count of the fixed version is almost the same (the connectors and qualifiers drop, the body stays).

Before · 47 words · 1 finding · 16 tokens recoverable
CLAUDE.md (before)
After · 4 lines · 0 findings · zero bloat
CLAUDE.md (after)

What changed in the model's view: four discrete signals instead of one. Each bullet has its own attention budget. "Run tests AND check snapshots" cannot get truncated because there is nothing else to the rule. The single-bullet version did get truncated in the field; the bug we wrote it to prevent (broken snapshots merged because tests alone passed) shipped twice before the rule got split.

5. What the report looks like

Paste any agent-config file into the analyzer and the per-instruction findings come back sorted by line. Five bloat lines on a 194-line file, each one named with its word count and the suggested fix. The savings number at the bottom is the sum of Math.floor(estTokens(line) * 0.35) across every bloat finding.

ccmd score ./CLAUDE.md

The $5.80 per engineer per month number is small on purpose. The per-instruction threshold is not where the dollars hide; the file-level threshold is. The per-instruction threshold is where the reliability hides. A rule that fires half the time costs you regressions, not tokens.

6. How to fix every line over the threshold

1

1. Find every line over the threshold

Paste your CLAUDE.md into the textarea on ccmd.dev. The analyzer iterates with lines.forEach and reports every line whose split on whitespace exceeds 28 tokens. Sorted by line number, not severity, so you read the file top to bottom.

2

2. Split, do not shorten

The hard-coded suggestion field is always 'split into 2-3 shorter directives, each one actionable.' Rewriting a 38-word rule into a 22-word rule keeps the same shape and tends to attract the second half back. Splitting into three actionable bullets does not. The model treats each bullet as a separate signal.

3

3. Re-paste and watch the score climb

Each split line removes one finding and credits floor(estTokens(line) * 0.35) tokens back to potentialSavingsTokens. The 35% is empirical: it accounts for the connector words, the qualifier clauses, and the 'in order to' constructions that disappear once a sentence becomes a bullet.

4

4. Recheck before commit

Add a one-line CI step that re-runs the analyzer on every PR that touches CLAUDE.md. The paid tier of ccmd does this automatically and posts the diff as a PR comment. The free tier is one paste at a time; either way the goal is zero bloat findings before the merge.

7. Why splitting beats shortening

The suggestion field is hard-coded to "split into 2-3 shorter directives, each one actionable." That is deliberate. The alternative (rewriting a 38-word rule into a 22-word rule) is what most people try first, and it underperforms.

The reason is shape. A 22-word rule still asks the model to track multiple constraints in one breath, just with fewer connectors. The back-half attrition drops, but the rule is still a single attention unit. Splitting the same content across three 8-word bullets gives the model three separate hooks. Bullets read as a list; the model treats list items as discrete signals. Tight prose reads as one sentence; the model treats it as one signal.

The same principle is why "## Always" lists outperform their paragraph equivalents in every CLAUDE.md we have scored. The per-instruction threshold is not really a word-count check. It is a shape check, expressed as a word count because words are easy to count.

Your CLAUDE.md has 5 lines over the threshold. Want a 20-minute walk-through?

Bring the file. We open it together, score it on ccmd, split every bloat line into actionable bullets, and you leave with the diff. Useful if you have hit a Claude rate limit or noticed a rule firing inconsistently.

Frequently asked questions

How many tokens per CLAUDE.md instruction?

About 37 tokens, or 28 words. That is the per-line ceiling baked into the ccmd analyzer at src/lib/analyzer.ts line 150. The trigger is wordCount > 28 (a strict greater-than), so 28 words exactly is the highest count that does not fire. The human-readable message uses 25 because that is the practical target: aim for 25 or fewer per line, the analyzer flags you at 29 or more.

Why two numbers (28 vs 25)?

The detector fires above 28 words. The message tells you to aim under 25. The gap is intentional. Around 25 to 28 words a rule still works, but it is in a degraded zone where the second half is at risk if anything else competes for attention in the same turn. The analyzer is permissive on the boundary and strict on the target. Set the target in your editor (a 25-word soft limit) and the analyzer becomes the safety net, not the constant noise machine.

Where does the 28-word number come from?

Two sources stacked. First, Anthropic and the prompt-engineering literature converge on the observation that long directives lose attention from the back half forward; the precise rollover is model-dependent but consistently lands in the 25 to 35 word range for instruction-following tasks. Second, our own analysis of public CLAUDE.md files (sampled from GitHub repos that publish them) showed a sharp uptick in reported 'Claude is ignoring this rule' issues at 28+ words per line. 28 is the boundary where the two converge.

Is this the same as the file-level threshold?

No, and conflating them is the most common mistake. The file-level threshold (Anthropic suggests under 200 lines, roughly 2,000 to 2,500 tokens) controls how much the whole file costs you per turn. The per-instruction threshold controls how much of any one rule the model will follow. A 1,500-token CLAUDE.md with five 40-word rules can fail every one of those five rules while passing the file-level budget. Both matter. Neither replaces the other.

Does the threshold change by model?

The analyzer hard-codes one number because the curve is shallow. Opus 4.7, Sonnet 4.6, and Haiku 4.5 all show degraded instruction-following past roughly 25 to 35 words per directive. The exact rollover point differs by a few words across models, but 28 is well inside the band where every Claude 4 model starts dropping the back half. The same threshold applies to AGENTS.md (Codex), .cursorrules, and .grokrules because the underlying attention behavior is general, not model-specific.

Why does the savings credit cap at 35% of the line?

The reducer is at src/lib/analyzer.ts line 270, summing tokenSavings across findings. For bloat, that field is set to Math.floor(estTokens(line) * 0.35). The 35% accounts for the average length reduction when a 30 to 40 word rule is rewritten as 2 to 3 short directives. Connectors ('in order to', 'and also', 'making sure to', 'please be careful to'), qualifier clauses, and politeness markers all drop. The remaining 65% is the actual signal; it stays.

What if I genuinely need a long directive?

Then it does not belong in CLAUDE.md. The analyzer's working assumption is that CLAUDE.md is a list of short, every-turn constraints. Long-form explanations belong in a skill (loaded on demand, billed only when invoked) or a referenced doc (linked from CLAUDE.md, fetched by the agent when relevant). The per-instruction threshold is what forces that split. A rule that cannot be expressed in 25 words is a rule that is not just one rule.

Does the threshold apply to headings and prose, or only to bullet rules?

Every non-blank line. The detector splits on whitespace and counts words; it does not parse Markdown structure. A long paragraph in a 'Stack' section trips the threshold the same way a long bullet trips it. The implicit advice is that paragraphs do not belong in CLAUDE.md either: prose has fewer hooks for the model than bullets do, and uses more tokens to carry the same information.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.