/t · guide · audit

CLAUDE.md layer audit for token bloat: rank the layers, cut the worst one first.

Matthew Diakonov, Written with AI

Published May 19, 20266 min read

You ran a CLAUDE.md audit, got a list of findings, fixed a few lines, and your bill did not move. Most likely the file you audited was the one you most recently edited, which is also the freshest. The file that needed auditing was somewhere else in the stack: probably the user file you set up six months ago, possibly an @import target you forgot is loaded inline on every turn. This page is how to stop guessing and pick the right layer first.

1. The one number you need from each layer

The analyzer returns a long object per file. For the layer-audit you only need two fields. Both are visible in the result panel on ccmd.dev: totalTokens and potentialSavingsTokens.

analyzer.ts

A finding without a tokenSavings field is still a real flag. It just is not counted in the savings total, because the analyzer will not pretend to know how many tokens you cut by rewriting an aspirational rule into a concrete one. That conservatism is what makes the savings number comparable across files of very different shapes. Two files with the same density have a similar tax-to-size ratio, regardless of whether one is mostly bloated lines and the other is mostly duplicates.

2. Bloat density, the per-layer rate

Define it as a fraction:

density = potentialSavingsTokens / totalTokens

This is a rate, not a total. A 200-token layer with 60 tokens of flagged savings is 30% bloated. A 5,000-token layer with 600 tokens of flagged savings is 12% bloated. The big file saves more tokens in absolute terms when you cut it, but it is the smaller one that is rotting faster. The smaller one's author lost the thread of what the file was for. Cut the rate first; the totals follow on the next pass.

3. The five-step procedure

List your layers
From Anthropic's memory docs: managed policy, user, project, project-local, every @import, and .claude/rules/*.md without paths:.
Score each one
Paste each layer into the analyzer at ccmd.dev. Record totalTokens and potentialSavingsTokens for each.
Compute density
density = potentialSavingsTokens / totalTokens. A layer's size is its denominator, the flagged cuts are its numerator.
Sort descending
The highest-density layer is your worst offender. Size alone is misleading: a 200-token layer can outrank a 5,000-token one.
Cut the top, re-audit
Fix the worst layer, re-score it. If its density drops below the next layer, move on. Otherwise stay on it.

Run as a terminal session it looks like this:

layer-audit.sh

The whole loop is paste, write down two numbers, divide, sort. There is no tooling beyond the textarea, no signup, no upload of your files; the analyzer runs in the browser. The friction is the manual paste of each layer, which is also what keeps the audit honest: you have to look at every file in the stack to score it.

4. Worked example: three layers on a real machine

Same machine that ships this site. The project file is the canonical one-line @AGENTS.md import recommended by Anthropic's memory docs, so all the project rules live in AGENTS.md (which is expanded inline at launch). The user file is the one I have not touched in months. Ranked by density:

~/.claude/CLAUDE.md4,897 tok · 1,420 cuts29.0%

Triage first. Machine-wide, every repo on this box. Every percentage point cut here is paid back in every project for the rest of the week.

./AGENTS.md82 tok · 12 cuts14.6%

Second pass. Per-repo, expanded by @AGENTS.md. A small file with a high ratio still cuts every turn this repo is open.

./CLAUDE.md3 tok · 0 cuts0.0%

Skip. One import line, no findings. Cleaning it returns nothing.

The pasted project file scored 0% density. Cleaning it would return nothing. The user file scored 29% and is the layer to cut first; it also happens to be the largest layer by token count, but that is not why I picked it. I picked it because its rate of dead lines per live line is higher than the other two, and because it loads in every repo on this box, so the savings multiply across the week.

5. The order of operations after ranking

Once you have the layers sorted by density, the per-file audit takes over. The per-layer audit picked the file; the per-file audit picks the lines. The findings list on ccmd.dev for the top layer is the input to that next pass. Three checks contribute to the savings number (bloat, duplicate, cache_bust) so those are the cuts you can make today. The other four findings (vague, aspirational, missing_why, conflict) point at lines that need rewriting, not deletion, and the savings from those are not in the ratio.

Re-score the layer after the cut. If its density drops below the next layer in the ranking, switch to that one. Otherwise stay on it. The order of operations is dynamic, not fixed: a layer that scored 29% before a pass and 11% after is no longer the worst offender, even if it still has more raw bloat than the next layer down.

For the line-level work on the file you pick, the three cost states audit covers the dollar math and the cache-bust line specifically. For the arithmetic of summing every layer's cost (the bill total, not the bloat rate), the layered token cost page is the matching procedure. Density picks the layer to fix; those two cover what to fix inside it and how much it saves.

Want a second pair of eyes on the layer that ranked highest?

Bring your three numbers (tokens, savings, density) per layer and we will walk through the cuts together on the call.

Frequently asked questions

Why density instead of raw token savings?

Because the two numbers answer different questions. Raw savings tells you how many tokens this layer could lose. Density tells you how rotten this layer is per token you spend on it. A 4,000-token user file that could lose 400 tokens (10%) is a maintained file; a 600-token AGENTS.md that could lose 240 tokens (40%) is decaying twice as fast and will keep decaying twice as fast. Density is the rate. Cut the rate, not the total. The total falls out.

What counts as a layer?

Per Anthropic's memory docs, six layers load in full at launch and sit in context on every turn: a managed policy CLAUDE.md if your org ships one, ~/.claude/CLAUDE.md, ./CLAUDE.md or ./.claude/CLAUDE.md, ./CLAUDE.local.md, every file pulled in by an @path import (recursive up to five hops), and .claude/rules/*.md files that have no paths: field in their frontmatter. A seventh kind, subdirectory CLAUDE.md, loads on demand only when Claude reads a file in that subtree and is worth scoring separately because its density bites only inside that subtree.

Which checks in the analyzer feed the savings number?

Three of the seven. The bloat check (lines over 28 words) estimates a 35% cut. The duplicate check estimates the full token cost of the second copy. The cache_bust check (ISO date or 'today' / 'this session' / 'right now' in the first 20 lines) estimates the full token cost of the volatile line. The other four (vague, aspirational, missing_why, conflict) flag without estimating. The savings number is conservative on purpose: it is what you can defensibly cut today, not a maximum.

Why is the machine-wide user file usually the worst layer?

Two reasons. It is the file you have not opened since you set up your machine, so it accumulates aspirational lines, dated context, and copied-in advice you stopped following. And every byte of it loads in every repo on the machine, so a 30% density there pays out across your whole week, not one project. The file you would naturally audit is the one you just edited, which is also the freshest. The file that needs auditing is the one you last touched a quarter ago.

What density is acceptable?

Under 10% is fine for a well-maintained layer; 10 to 20% is normal drift after a month or two of edits; over 20% means the layer needs a pass before you do anything else; over 30% means the layer is mostly dead lines around a small set of live ones, and rewriting is faster than editing. None of those thresholds are sacred. They are the bands we see when we audit our own files. The number you should care about is the relative density between your layers, because that is the order to cut them in.

Does the ratio account for prompt caching?

No, and it should not. Caching divides the per-turn bill on turns where the prefix is byte-identical. It does not change which layer is bloated. A cached layer with 30% density still spends 30% of its cost on lines you do not read; you just pay the cost at a discount. The density is a property of the file, not the request. Fix the density, the cache savings amplify.

How does this differ from the per-file audit on this site?

The per-file audit answers 'what is wrong with this file'. The layer audit answers 'which file is worst'. The first is a finding list, the second is a sort order. You need both: the layer audit picks the file, the per-file audit tells you which lines to edit. The pages link to each other for exactly that reason.