A CLAUDE.md silent failure audit, because every other config yells when it breaks.
1. The thing that makes CLAUDE.md weird
Take a swing at this. Of every config file in your repo, which one fails the loudest?
tsconfig.json fails at compile time. Red squiggles in the editor, exit code 1 in CI, a refusal to ship a build. .eslintrc fails at save time. Husky blocks the commit. jest.config fails at PR time. A red check on GitHub, a Slack ping, a blocked merge. .env fails at boot. The app crashes, the deploy log spits a stack.
CLAUDE.md fails like none of those. It ships every byte to Claude on every turn, then says nothing. There is no log entry for "Claude saw line 47 and decided to skip it." There is no telemetry for "the prohibition fired but was overridden by an internal weight." There is no exit code. There is no red squiggle. The rule lives in the file, you assume it works, and the only evidence that it does not is the user-visible regression weeks later, plus the slow dawning thought of "wait, didn't I tell it not to do that?"
That is what makes a silent failure audit a different animal from every other lint pass in your toolchain. The whole point of the audit is to produce a signal where the runtime produces none.
2. CLAUDE.md vs every other config that yells
Sketch out the failure surface of one against the other:
| Feature | Every other config in your repo | CLAUDE.md |
|---|---|---|
| Compile / lint error | red squiggle, exit code 1, blocks commit | none |
| CI signal | PR check fails, slack ping, merge blocked | none |
| Runtime trace | stack trace, Sentry event, error metric | no log entry, no warning, no telemetry |
| Test outcome | red test, named assertion, line number | tests pass; regression ships |
| Detection cost | seconds (next save / next push) | weeks (next time the rule mattered) |
| Discovery shape | build tool reports the violation directly | user reports the bug the rule was meant to prevent |
The right column has six places the failure shows up. The left column has zero. The audit is the missing column.
3. The 7 silent-failure shapes the analyzer catches
The detector is one TypeScript file at src/lib/analyzer.ts, 322 lines, no network calls. It walks the file you paste and looks for these seven shapes. Each one is a discriminated case in the Finding union at line 5:
One sentence on each, with the file line where the detector lives so you can read the regex yourself:
bloat
Lines over 28 words. The model treats the second half as filler. Detector at src/lib/analyzer.ts line 150. Suggestion is always the same: split into 2 to 3 directives.
vague
14 terms with no testable success condition: appropriate, properly, carefully, as needed, where applicable, when possible. Detector at line 163.
aspirational
Absolute words (always, never, must, in all cases, every time) with no 'unless' or 'except' clause. Real codebases have exceptions. Detector at line 177.
cache_bust
An ISO date or 'today' in the first 20 lines mutates the cached prefix every session. The whole file pays fresh input cost every turn. Detector at line 194. Severity: high.
duplicate
Same rule on two lines. Tokens spent twice for one signal. Detector at line 207. Often a paste from a sibling project's CLAUDE.md.
missing_why
A 'NEVER' or 'DO NOT' followed by no 'because', 'reason', 'past', or 'incident' inside 4 lines. The agent follows until an edge case, then guesses. Detector at line 227.
conflict
Two contradicting absolutes in the same file ('never use comments' plus 'add comments'). The model has no resolution path. Detector at line 243.
4. Two detectors worth reading by hand
Most of the seven are obvious from the kind name. Two are not. The vague detector is just a banned-words list, but the contents of the list matter more than the regex. These 14 terms are the ones a senior engineer drops into a CLAUDE.md without noticing they have written nothing testable:
The cache_bust detector is the high-severity one, and it is also the one nobody else catches. A single line like "Today is 2026-05-20" in the first 20 lines invalidates the prompt cache for the entire file on every session:
The cost math: a 6,000-token CLAUDE.md with a cache hit costs about a tenth of a fresh read. One stale date in the first 20 lines, and every turn pays full input cost. Across a 30-turn session at Opus 4.7 input rates, that is the difference between roughly $0.27 and $2.70 spent on a file that has not changed.
5. A real audit on a 24-line CLAUDE.md
A representative file from a Next.js payments repo. 218 tokens, 24 lines, no obvious problems on a first read:
The audit output, top to bottom:
Nine findings, 74 tokens of recoverable waste per turn, and the file scores 5 of 12 on the Karpathy rubric. Every one of those nine lines would have shipped to Claude on every turn forever, producing no error and no warning, until somebody finally got curious. That is the gap the audit closes.
6. Five steps to run the audit yourself
1. Paste the file
Open ccmd.dev. There is a textarea at the top of the homepage. Paste the entire CLAUDE.md (or AGENTS.md, .cursorrules, .grokrules). No signup, no upload; the analyzer is pure client-side TypeScript.
2. Read the silent-failure list
The right column lists every finding with line number, severity, message, suggestion, and per-line token savings. Order is by line number, not severity, so the audit reads top-to-bottom of your file.
3. Fix high-severity first
cache_bust is the only finding kind marked 'high' by default. It is also the one no other CLAUDE.md audit guide flags. Move the dated line to the bottom of the file (or remove it). One edit, recovers full prompt-cache hit rate for the whole file.
4. Fix the cluster of mediums
bloat, missing_why, duplicate, and conflict are mediums. Split bloat lines into 2 to 3 directives. Add a one-line 'Why:' under every NEVER/DO NOT. Delete duplicate lines. Resolve any conflict by scoping the rule to a context.
5. Re-paste and confirm the score
Paste the edited file. The token count drops, the rubric score climbs, and the findings list shrinks. If a finding survives, the analyzer points at the line that still trips it. Iterate until the only remaining findings are the low-severity vague/aspirational ones you intentionally kept.
7. What the audit does not catch (yet)
The 7 detectors are per-line, on the bytes of one file. That misses three other shapes of silent failure that live across files:
- CLAUDE.md vs settings.json contradictions. A "never run destructive shell commands" rule in your file, plus a permissive Bash allowlist in .claude/settings.json, leaves the rule visually present and operationally dead. The current analyzer does not cross-reference allowlists.
- Subagent inheritance. Subagents launched from your session do not inherit the parent CLAUDE.md. Any rule that mattered to the parent silently does not apply in the subagent context. See the sibling guide at /t/subagent-claude-md-inheritance for the per-subagent token math.
- Plugin overwrites. Some installable plugins write their own block into the bottom of CLAUDE.md and quietly clobber the previous content. The file you wrote yesterday is not the file that ships today. See /t/claude-md-plugin-overwrite-risk for the failure mode.
These three are on the analyzer roadmap. The current free tier handles the seven in-file shapes, which are the silent failures you can detect from the bytes alone, and which add up to most of the actual waste in a typical config.
Want a second pair of eyes on your CLAUDE.md?
20 minutes, your file on screen, the silent failures called out by line number. Free.
Frequently asked questions
What is a silent failure in CLAUDE.md?
A line that ships to Claude every turn but never shapes the agent's behavior, and produces no log, error, or warning when it fails. The rule is in the file, you assume it is working, and the only evidence that it failed is the user-visible regression weeks later. Compare to TypeScript (red squiggle), ESLint (CI fail), tests (red bar): CLAUDE.md is the only tool in the stack with no failure surface.
What are the 7 silent-failure shapes?
bloat (lines over 28 words; the model treats the second half as filler), vague (14 untestable terms like 'appropriately' or 'where applicable'), aspirational (absolutes without 'unless'), cache_bust (ISO date or 'today' in the first 20 lines), duplicate (same rule twice), missing_why (NEVER/DO NOT without 'because'), and conflict (two contradicting absolutes). Each kind is a discriminated case in the Finding union at src/lib/analyzer.ts line 5.
How do I run the audit?
Paste your file into the textarea on ccmd.dev. The analyzer runs in your browser in about 220 ms; no signup, no upload. You get total tokens, the Karpathy 12-rule pass rate, and a line-by-line list of findings with token-savings estimates. The same detector works on AGENTS.md, .cursorrules, and .grokrules; detection is by content shape, not filename.
Why does the cache_bust finding matter more than the others?
Anthropic and xAI both cache long system prompts. A cache hit on a 6,000-token CLAUDE.md costs roughly a tenth of a fresh read. One line like 'Today is 2026-05-20' in the first 20 lines mutates the cached prefix every session, so every turn pays full input cost for the entire file. The one-line fix (move the date to the bottom, or delete it) recovers full cache hit rate for every turn. That is the highest-leverage silent-failure fix in the audit.
Will the audit catch rules that fail because of settings.json?
Not in the per-line analyzer pass. ccmd treats CLAUDE.md as a contract; a future cross-check walks settings.json hooks and MCP allowlists for prohibitions the file states but the runtime contradicts (a 'never run rm -rf' rule with a Bash allowlist that permits it, for example). The per-line analyzer at src/lib/analyzer.ts is the part that ships today and runs in your browser; the settings.json cross-check is on the analyzer roadmap.
What is the threshold for bloat?
28 words per non-blank line, set at src/lib/analyzer.ts line 150. The threshold is empirical: rule lines over 25 words consistently get treated as one signal, and the second half tends to be ignored. The fix is always the same shape, which is why the analyzer's suggestion field is hard-coded to 'split into 2-3 shorter directives, each one actionable.'
Does the analyzer upload my CLAUDE.md?
No. The analyzer is pure client-side TypeScript in src/lib/analyzer.ts. Open DevTools, watch the Network tab, paste a file: there is no POST. The free tier of ccmd is one-shot and entirely in-browser. The paid tier ($9 to $19 per month solo, $49 per team) adds continuous monitoring, weekly drift email, and PR diff comments, which do upload the file you opt to monitor.
How long does a full audit take?
About 4 minutes start to finish on a 200-line CLAUDE.md. The detector itself runs in 220 ms in the browser. The rest is reading the findings list, opening your editor, and making the edits. The highest-impact fix (cache_bust) is one line move; the next four fixes are 30 seconds each. The vague/aspirational ones are taste, not bugs, and you can leave them.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.