02 / karpathy 12-rule scorecard

Twelve rules, measured against agent failure modes.

Andrej Karpathy posted four CLAUDE.md principles on 2026-01-26. Forrest Chang turned them into a 65-line template; the repo hit 120,000 stars by mid-year. A community extension added eight more rules and reported a measured mistake-rate drop from 41% to 11% across a controlled task set.

These twelve rules are the default rubric ccmd grades against. Below: each rule, the agent failure it's meant to catch, and the source.

  1. 01

    Think Before Coding

    Agent dives into edits without forming a plan, then thrashes when it hits the first surprise.

    source: Karpathy thread (Jan 2026): silent wrong assumptions.

  2. 02

    Simplicity First

    Agent over-engineers because it pattern-matches what 'professional code' looks like, not what the task needs.

    source: Karpathy: over-complication. Forrest Chang template.

  3. 03

    Surgical Changes

    Agent rewrites a working file while fixing one line. Drive-by refactor breaks tests in unrelated modules.

    source: Karpathy: orthogonal damage.

  4. 04

    Goal-Driven Execution

    Agent stops at the first compile; success is 'merge-able feature with passing tests', not 'no error message'.

    source: Karpathy four principles. /goal command primitives in Claude Code 2.1.139.

  5. 05

    Avoid Silent Assumptions

    Agent guesses an API shape, a column name, a config flag, and ships a confident-looking lie.

    source: Karpathy: silent wrong assumptions are the most expensive failure mode.

  6. 06

    No Orthogonal Damage

    Agent 'cleans up' an unrelated file because it noticed a style issue. PR diff exceeds the task.

    source: Karpathy. Boris Cherny's add-to-CLAUDE.md-when-Claude-is-wrong loop.

  7. 07

    Tests as Truth

    Agent claims 'done' before running the test suite. Or worse, modifies tests to pass.

    source: Karpathy-12 community extension (Apr 2026).

  8. 08

    Concise Output

    Agent ends every response with a 200-word recap of what it just did. You can read the diff.

    source: Boris Cherny vanilla-setup thread (Mar 2026).

  9. 09

    Stack Awareness

    Agent picks the wrong package manager, the wrong test runner, the wrong import style. CLAUDE.md never named the stack.

    source: Boris Cherny: name your stack explicitly.

  10. 10

    Tool Preference

    Agent uses fetch when you want axios; uses rg when you want grep; uses pytest when your repo is unittest.

    source: Karpathy-12 community extension. Implicit in Boris Cherny's example file.

  11. 11

    Failure Mode Coverage

    Same bug ships every two weeks because nobody added the past-incident rule that would have caught it.

    source: Boris Cherny: 'Anytime we see Claude do something incorrectly we add it to the CLAUDE.md.'

  12. 12

    Self-Improvement Loop

    CLAUDE.md is treated as immutable. The file should be edited every time Claude is wrong.

    source: Boris Cherny. Karpathy-12 extension closing rule.