Score the CLAUDE.md and AGENTS.md files you never audited.
Most config files are unaudited. You wrote one when you set up the repo, it grew a section every time the agent did something annoying, and nobody ever ran a number on it. The advice you find for fixing that is all qualitative: keep it short, be specific, delete dead rules. True, and none of it tells you where you are starting from. An audit gives you a starting number. This page is about that first number, what it almost always is, and why.
1. The unaudited baseline is 2 out of 12
We collected five config files that were written by hand and had never been scored against a rubric. Different repos, different authors, different stacks: Next.js, Go, Python FastAPI, React. One of the five is ccmd's own bundled sample. We pasted each one, unedited, into the analyzer.
Every file landed on the same number. Not close to the same number. The same number. A 56-token .cursorrules and a 280-token CLAUDE.md both scored 2 out of 12. And it is not only the score that matched. All five files missed the identical set of ten rules. The two rules they passed were the same two rules across the board.
The one rule a hand-written file reliably earns is R9, stack awareness. Its regex looks for a named language or framework, and every file lists one, because describing the stack is the obvious thing to do when you write a config file. So the floor is not zero. It is the score of someone who did the obvious thing and stopped there. That is what unaudited means in practice: not empty, just never measured.
2. Why every unaudited file lands in the same place
The score is identical across five files because the rubric tests for the presence of specific rules, and the rules a hand-written file skips are skipped for the same reason every time. They do not feel like things you put in a config file. Here are the ten that all five files missed, grouped by what they are actually asking for.
Restraint before edits
R2 simplicity first, R3 surgical changes, R6 no orthogonal damage. Three rules that tell the agent to make the smallest change and touch nothing unrelated. A hand-written file almost never says this, so the agent reaches for the large edit by default.
When to stop, when to ask
R4 goal-driven execution and R5 avoid silent assumptions. A completion condition and an instruction to ask instead of guess. Without them the agent does not know when it is done, or when it should interrupt you.
Tests as the truth
R7. A line that says tests must pass before the work counts as done. This is the rule teams notice missing first, because its absence is the agent declaring success without having run anything.
Output discipline
R8 concise output. One line asking for short answers. Without it, every action ends in a paragraph of recap that costs output tokens and buries the result you actually wanted.
Tool and library preference
R10. Which library to reach for, which to avoid. Unaudited files describe the stack but rarely state a preference, so the agent quietly picks for you.
Memory of failure
R11 failure-mode coverage and R12 self-improvement loop. A record of what went wrong before, and an instruction to write new lessons back into the file. Almost no hand-written file has either. This is the highest-leverage gap on the board.
The rubric is one of two layers. The other is a per-line scan, and it is where the lines you already wrote show up as problems rather than absences. The bundled sample returned 18 of these findings: 10 aspirational (an absolute like "always" or "never" with no escape clause), 5 vague (a soft word like "properly" or "appropriately" the agent cannot test itself against), and 3 missing-why (a prohibition with no reason attached). An unaudited file is not just missing the ten rules. The lines it does have are mostly the wrong shape: "always write clean code" is aspirational and vague in a single five-word line. The scan names every one with a line number.
3. The analyzer never reads your filename
Look at the first column of the scan again. The file named SAMPLE_CLAUDE_MD was detected as a .cursorrules file. That is not a mistake. ccmd's analyzer never sees a filename. It reads the first 300 characters of whatever you paste and routes by content. The detector flips to .cursorrules when the input opens with the phrase You are an expert, because that opening is the .cursorrules idiom. The bundled sample, a constant literally named SAMPLE_CLAUDE_MD in src/lib/analyzer.ts, opens with "You are an expert TypeScript engineer", so it classifies as .cursorrules.
This matters when you have several files to score. You do not have to tell the analyzer what each one is. Paste a file, named or unnamed, and it picks the right header on its own. The score does not change with the label; the same 12 rules run on all four formats. Detection by content is also why scoring a directory full of files is just paste, read, paste the next one. Every config file you have ever written is unaudited until a number says otherwise, and the number is the same kind of number regardless of which tool the file was written for.
4. Why the score holds still when you run it twice
The other common way to audit a config file is to ask an LLM to grade it: paste the file into a chat with a rubric prompt and read the score it gives back. That works once. It does not work as a baseline, because the baseline has to hold still so a later version can be compared against it. ccmd's score is a set of regex tests, so the same file always returns the same number.
| Feature | asking an LLM to grade it | ccmd score |
|---|---|---|
| Same file, same score | no, the number drifts run to run | yes, regex tests, identical every run |
| Token cost to run | a full prompt, every time | zero, runs in your browser |
| Names the exact lines | depends on how you wrote the prompt | per-line findings with line numbers |
| Why you can trust the result | you trust the grader you grade with | all 12 rules are readable regex |
| Re-measure after a fix | the score may move on its own | moves only if the file moved |
| AGENTS.md and .cursorrules | a different conversation each time | same rubric, detected by content |
For a file you are auditing for the first time, determinism is the whole point. The 2/12 is not an opinion about your file. It is a fixed coordinate. Make four edits, re-paste, and any movement you see is movement you caused. An LLM-graded audit cannot give you that, because it cannot tell you whether a 2-point jump came from your edits or from the grader having a different day.
5. Score your own files
The point of a baseline is that you take it before you change anything. Do not tidy the file first. The unedited score is the one worth knowing.
- 1
Open the analyzer
Go to ccmd.dev. The homepage is the analyzer: a single textarea, no signup, no upload.
- 2
Paste one file, exactly as it is now
Drop in your CLAUDE.md or AGENTS.md unedited. The first scan is the baseline, so resist cleaning it up first.
- 3
Read the rubric score
The number out of 12 is your verdict. If it is around 2, you have a normal unaudited file. The flagged-lines list below it is today's work.
- 4
Score the next file the same way
Repeat for every config file you have: AGENTS.md, .cursorrules, nested CLAUDE.md files. Detection is by content, so you never have to label them.
Once you have the baseline, the route up is short. The single highest-leverage move is the pair the "memory of failure" card describes: one line recording a real past incident, and one line telling the agent to append future lessons to the file itself. After that, a tests-must-pass line, a completion condition, and a one-line plan step are each a single sentence that flips a rule. Five or six sentences is usually the difference between 2/12 and the high single digits. The config audit walkthrough covers the full rubric and the fix order line by line.
Want us to score your config files with you?
15 minutes, free. Paste your CLAUDE.md and AGENTS.md, we read the baseline together and leave you with a ranked fix list.
Frequently asked questions
What score does an unaudited config file usually get?
Around 2 out of 12. We took five config files that were written by hand and never scored against a rubric: three CLAUDE.md from different repos, one AGENTS.md, one .cursorrules. The stacks were Next.js, Go, Python FastAPI, and React. Every one of the five came back at exactly 2 out of 12 on the Karpathy rubric. 2/12 is the working baseline for a file nobody has audited. It is not a bad score because the author is careless; it is the score you get when you write a config file the way it feels natural to write one.
Why do completely different files score the same number?
Because the rubric tests for the presence of specific rules, not for effort or length. A 56-token .cursorrules and a 280-token CLAUDE.md both scored 2/12 because both authors wrote down the same instincts: name the stack, say write clean code, say never commit secrets. The ten rules a hand-written file skips are skipped for the same reason every time. Nobody thinks to write a completion condition, a tests-must-pass gate, or an instruction telling the agent to record its own mistakes, because those do not feel like things you put in a README.
Does it matter whether the file is CLAUDE.md, AGENTS.md, or .cursorrules?
No. The same 12-rule rubric and the same seven per-line checks run against all four formats ccmd supports (CLAUDE.md, AGENTS.md, .cursorrules, .grokrules). If you have several config files across tools, paste each one separately. The score is comparable across formats, so a 7/12 CLAUDE.md sitting next to a 3/12 AGENTS.md is a real, measurable gap, not a vibe.
Why did my CLAUDE.md get classified as a .cursorrules file?
The analyzer never sees the filename. It reads the first 300 characters and routes by content. The detector flips to .cursorrules when the input opens with the phrase 'You are an expert', because that opening is the .cursorrules idiom. ccmd's own bundled sample, a constant literally named SAMPLE_CLAUDE_MD in src/lib/analyzer.ts, classifies as .cursorrules for exactly this reason: its first real line is 'You are an expert TypeScript engineer.' The classification does not change your score. The same rubric runs either way. It is just the label on the report.
Is the score deterministic, or does it change each run?
Deterministic. Each of the 12 rules is a regex test against your file, and the seven per-line checks are plain string and word-count rules. Paste the identical file twice and you get the identical 2/12 twice. That matters for an unaudited file specifically: the first scan is a baseline you will measure future versions against, so the baseline has to be stable. An audit done by asking an LLM to grade your file is not stable; the same file can come back 6/10 one run and 8/10 the next.
How long does scoring take, and does anything get uploaded?
The analyzer is pure client-side TypeScript in src/lib/analyzer.ts. It runs in roughly a quarter of a second on a normal-sized file, entirely in your browser. There is no upload, no signup, no backend call. Open your network tab and watch: pasting a file produces no POST. That is deliberate, because the file you are scoring is often the most sensitive prose in your repo.
What is the fastest way to move off 2/12?
Add the rules the baseline scan says are missing, starting with the highest-leverage pair: R11 (a record of a past failure) and R12 (an instruction to write new lessons back into the file). Those two are what make a config file compound instead of going stale. After that, a tests-must-pass line (R7), a completion condition (R4), and a one-line plan step are each a single sentence that flips a rule. Five or six sentences typically takes a hand-written file from 2/12 to the high single digits.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.