CLAUDE.md rule frequency audit: every rule fires 100%. What to measure instead.
The thesis: "frequency" is a category error here.
The phrase "rule frequency audit" sounds reasonable. It maps onto how engineers usually think about config: rules in a hot path get hit a lot, rules in a cold path get hit rarely, and you want to know which is which before you prune. The intuition works for log lines, feature flags, A/B variants, sometimes even for prompt skills. It does not work for CLAUDE.md.
The reason is in the analyzer:
That hardcode (line 264, sitting one comment under // Token math: assume CLAUDE.md fires every turn) is not a simplification. It is the API behavior. CLAUDE.md gets concatenated into the system prompt at session start and re-sent on every API call until the session ends. There is no per-rule gating, no line-level dispatch, no "skip section if irrelevant". Every non-blank line you wrote is paying its token cost on every turn.
So the literal "how often does each rule fire" question has a boring answer: 100%. Sorting your file by firing count gives you the same number 214 times. Cutting everything below the 10th percentile gives you an empty file.
What an honest audit looks like.
The metric people actually want when they ask for a frequency audit is effective frequency: how often does the agent actually act on this rule when it fires. That number varies wildly across rules. A specific named-tool preference like prefer pnpm over npm gets followed close to 100% of the time it's relevant. A vague aspirational line like always write clean code gets followed somewhere around 0% of the time it's relevant, because nothing about it tells the model when it's satisfied.
You cannot read effective frequency directly. Anthropic does not expose a "this rule was applied" signal in the response. What you can do is read the structural shape of each line and predict which quadrant it lands in:
Fires 100% · Followed often
Concrete, specific, with a 'why' or an exception clause. Stack lines, tool preferences, named forbidden patterns. These are the rules you want to keep. The analyzer passes them silently; nothing in Finding[] gets emitted.
Fires 100% · Followed rarely
Every byte still gets billed every turn. The model heeds it on the first task, then drifts when the next task has any wrinkle. This is the deadweight zone the analyzer is built to surface. Six of the seven Finding.kind values live here: vague, aspirational, missing_why, duplicate, bloat, conflict.
Fires 100% · Followed never
Conflict-class rules. ccmd emits a high-severity Finding when the file contains both 'never use comments' and 'add comments' (analyzer.ts line 244). The model has no resolution path so it picks one at random per turn. Effective frequency is undefined.
Fires 100% · Costs 10x
Cache_bust class. An ISO date or 'today' in the first 20 lines invalidates prompt cache. Frequency stays 100%, but the unit cost moves from a cached read (~$0.09 over 30 turns) to a fresh read (~$0.91). Source: analyzer.ts line 194, idx < 20.
The 7 shapes the analyzer flags, as effective-frequency predictors.
The Finding.kind union at analyzer.ts line 7 names them. Each one predicts a specific failure mode for effective frequency, not a quality opinion. The static frequency column reads "100%" for every row because that's the literal firing rate. What changes is the shape of the failure when the rule fires.
| Feature | Effective-frequency failure mode | Static firing frequency |
|---|---|---|
| vague (low severity) | 100% (same as every other rule) | Rule has no testable success condition. Model satisfies it on easy turns, ignores it the moment scope gets ambiguous. |
| aspirational (low severity) | 100% | Absolute (always/never/must) with no exception clause. Followed until the first edge case, then guessed. |
| missing_why (medium severity) | 100% | DO NOT / NEVER with no 'because' / 'incident' / 'got burned' in the next 4 lines (analyzer.ts line 230). Followed by rote, dropped under pressure. |
| duplicate (medium severity) | 100% × 2 (billed twice) | Same trimmed lowercased line appears twice (analyzer.ts line 211). Token cost doubles, follow-rate does not. |
| bloat (medium severity) | 100% for the bytes | >28 words on one line (analyzer.ts line 150). Model treats the line as one signal; the back half gets dropped. Effective frequency is whatever fits in the front half. |
| conflict (high severity) | 100% for both contradicting rules | Contradicting absolutes (line 244). Effective frequency is undefined; the model picks one per turn at random. |
| cache_bust (high severity) | 100% firing, 1000% cost | Timestamp or session text in first 20 lines (line 194). Frequency is fine; what changes is the per-firing dollar cost. |
The audit, rephrased for the file you already wrote.
Two ways to read the request. One is the way the question is usually posed (count firings). The other is the way it can actually be answered (score shapes). Toggle:
What people ask for vs. what the audit actually is
Show me how many times each rule in my CLAUDE.md actually fired in the last 50 sessions. Sort by frequency. Anything under 10% gets cut.
- Treats CLAUDE.md like logs or metrics, which it isn't
- Implies a per-rule firing counter that doesn't exist in the API
- Cannot tell a vague rule from a specific one with the same firing count (both are 100%)
- Cannot be answered without instrumenting the agent's runtime, which Anthropic does not expose
One counterargument, with a real answer.
Pushback we get on this framing: "Fine, every line fires every turn at the prompt level. But the model still attends to some parts of the prompt more than others. Surely that's a kind of per-rule frequency." The answer is: yes, attention is non-uniform, but you still cannot measure it per-rule without instrumenting the model internals. What you can measure deterministically, from outside the model, is whether the line you wrote has any chance of getting attended to in the first place.
A 35-word line with three subclauses won't. A line with the word appropriatelyin it can't, because the model has no test for "appropriate" in the current task context. A line that duplicates a previous line costs tokens twice but adds no signal. These are the things the analyzer reads. The remaining variance (attention weight inside the rules that pass the shape check) is the part you cannot fix in CLAUDE.md; that lives in hooks, skills, and tool descriptions.
Where the analyzer lives, line by line.
- analyzer.ts:5 - Finding type, the 7 kinds.
- analyzer.ts:124 - VAGUE_TERMS list (appropriate, properly, carefully, well, as needed, where applicable, ...).
- analyzer.ts:130 - ASPIRATIONAL list (always, never, must, should always, in all cases, every time).
- analyzer.ts:150 - bloat threshold: more than 28 words on a single line.
- analyzer.ts:194 - cache_bust scan: only fires for lines with idx less than 20.
- analyzer.ts:211 - duplicate scan: same trimmed lowercased line, minimum 10 chars.
- analyzer.ts:230 - missing_why scan: looks for because / why / reason / past / got burned / incident / happened / caused in the next 4 lines.
- analyzer.ts:244- conflict scan: contradicting absolutes (the shipping check looks for both "never use comments" and "add comments").
- analyzer.ts:264 - the hardcode this entire page is about.
The whole analyzer is 322 lines, no network calls, no upload. The input goes nowhere; the Finding[] array lands in the same browser tab.
Related guides on this site.
- CLAUDE.md line firing audit - the same 7 checks, with the exact thresholds and the file:line where each implementation lives.
- Do CLAUDE.md rules fire on every turn? - the firing surfaces breakdown: CLAUDE.md (unconditional) vs skills / hooks / MCP descriptions (conditional).
- Prune dead CLAUDE.md rules - the action page once you have the audit output.
- Karpathy 12 rule scorecard - the file-level pass that runs alongside the per-line audit.
Want a second pair of eyes on the audit output?
Paste your file, run the analyzer, and book 15 minutes if the Finding[] is dense or the rubric pass count is under 6. I'll walk the report with you and point at the lines to cut first.
Frequently asked questions
Can I see how often each rule in my CLAUDE.md actually fires?
No, and not because the tooling is missing. Per-rule firing is not a thing in the Claude API. The CLAUDE.md file is concatenated into the system prompt at session start and re-sent on every turn. ccmd's analyzer encodes this directly: src/lib/analyzer.ts line 264 reads `const estimatedTokensFireEveryTurn = totalTokens`. There is no per-rule counter to query. Every non-blank line fires on 100% of turns. The thing people are reaching for when they search 'rule frequency audit' is effective frequency, which is how often the model actually acts on the fired rule. That can only be inferred from the rule's structural shape.
Then what is ccmd auditing if not frequency?
Structural predictors of low effective frequency. The Finding.kind union at analyzer.ts line 7 lists seven of them: bloat, vague, aspirational, conflict, duplicate, missing_why, cache_bust. Each one is a deterministic regex or word-count check. A rule that matches bloat (over 28 words on one line) consistently gets treated by the model as one signal with the back half discarded. A rule that matches vague (words like 'appropriately', 'carefully', 'where applicable') has no testable success condition so the model satisfies it on easy turns and abandons it under pressure. The 7-check rubric is a deadweight detector that doubles as an effective-frequency proxy.
Why is missing_why a frequency signal and not a quality signal?
Because a 'DO NOT' or 'NEVER' line with no follow-up explanation gets followed by rote until the agent hits an edge case the absolute did not anticipate. At that point the agent has no rationale to fall back on, so it guesses, and the rule's effective frequency collapses on exactly the turns where it mattered most. The analyzer scans the next 4 lines for one of: because, why, reason, past, got burned, incident, happened, caused (analyzer.ts line 230). If none of those are present, the line is flagged medium severity.
Are these checks the same as the Karpathy 12 rubric?
No. The 7 finding kinds are per-line scans for shape problems. The Karpathy 12 rubric is a separate pass that scores the file as a whole against 12 categories (Think Before Coding, Simplicity First, Surgical Changes, etc., listed in analyzer.ts line 49-122). Both layers run on every paste. The frequency audit lives in the per-line scan because frequency-of-effect is determined by the shape of individual lines, not by whether the file as a whole covers one of the 12 categories.
If every rule fires 100% of turns, why does cutting low-effective-frequency rules save money?
Two reasons. First, every byte still gets billed on the input side of every turn, so cutting 1,800 tokens of deadweight at 30 turns and Opus 4.7 rates saves about $0.81 per long session (input price $15 per million, math at analyzer.ts line 268). Second, cache_bust-class lines move the entire file from a cached read (roughly $0.09 per 30 turns) to a fresh read (roughly $0.91). The same firing frequency now costs roughly 10x. Pruning the deadweight returns the file to cacheable.
Does this audit apply to AGENTS.md, .cursorrules, and .grokrules?
Yes, identically. detectType() at analyzer.ts line 41 returns one of four inputType labels but the per-line scan does not branch on it. The firing model is the same on Codex, Cursor, and Grok Build: every non-blank line is in the system prompt on every turn. So the same 7 finding kinds and the same effective-frequency predictors apply. The host changes, the audit does not.
What about hooks, skills, and MCP tool descriptions?
Those have real per-rule frequency. A skill at skills/my-skill.md fires only when the model decides the name matches the task. A hook fires only on the tool event you registered for (PreToolUse, PostToolUse). An MCP tool description is loaded only when that tool is selected. If you want a rule with non-100% frequency, that is what you reach for. CLAUDE.md is the unconditional layer; the other three are the conditional layers. The /t/claude-md-rules-fire-every-turn guide on this site has the full split, with a terminal trace.
Where can I run the audit?
Paste the file into the textarea on ccmd.dev. The analyzer is pure client-side TypeScript; nothing is uploaded. You get the Finding[] array sorted by line number, the Karpathy 12 rubric pass count, totalTokens, estimatedCostPerLongRunSession at Opus 4.7 rates over 30 turns, and potentialSavingsTokens (the sum of tokenSavings across findings). The analyzer ships with the site; the path on disk is src/lib/analyzer.ts.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.