/t · guide · measurement

Claude Code auto memory tokens, measured on one real machine.

Matthew Diakonov, Written with AI

Published May 18, 20267 min read

Direct answer · verified 2026-05-18

MEMORY.md fires on every turn, the same way CLAUDE.md does, but only the first 200 lines load. On a real index of 1,135 lines that we measured (~/.claude/projects/-Users-matthewdi/memory/MEMORY.md), the loaded slice is 21,677 characters, about 5,420 tokens per turn by the chars/4 heuristic Claude's CLI uses. Per-topic memory files (the bodies the index points to) do not auto-load; Claude opens them on demand via the [[name]] links and bills their full body once per session, per file opened. The 935 lines past the 200-line cut on the same machine are invisible to Claude but still occupy disk and keep growing on every save-memory call. Authoritative reference: code.claude.com/docs/en/memory.

Most of the writing on this so far repeats the line that auto memory's load cost is negligible, a few hundred tokens. That is true on day one. It is not true on month nine, when Claude has been saving entries to your global MEMORY.md every time you correct it and the file has crossed the 200-line cap without you noticing.

What follows is one measurement on one real machine, in the order we did it. The point is the shape, not the specific numbers: your MEMORY.md is smaller or larger, your truncation tail is shorter or deeper, but the cost structure is the same and the questions worth asking are the same.

1. Where the auto memory lives on disk

Claude Code stores per-project auto memory under ~/.claude/projects/<slugified-absolute-path>/memory/. The slug is just the absolute path with slashes replaced by dashes. There is one global directory whose slug is just the user home (-Users-matthewdi on this machine); every other project gets its own.

real shell · 2026-05-18

That is 1,135 lines in the index and 83 topic files in the same directory. The index is what gets injected into the system prompt at session start. The topic files are bodies Claude reads on demand when the index entry looks relevant to the task.

2. Shape of one memory entry

Two files. The index entry (one line, in MEMORY.md) and the body (a separate .md with frontmatter). The body lives next to the index but only loads if Claude decides to follow the link.

MEMORY.md (index — top of file shown)

The body file referenced from the first index line:

memory/posthog-projects.md

On every turn the index loads, capped at 200 lines. The body loads on the turn Claude judges the entry relevant, costs its full file size in tokens (the frontmatter, the prose, the [[name]] back-links), and stays in context for the rest of the session.

3. The 200-line truncation, in numbers

0lines in MEMORY.md

0lines that actually load

0lines past the cut (dead on disk)

0tokens per turn from the index

The 200-line cap is documented; the implicit contract is that MEMORY.md is an index, not the memory itself. One line per entry, roughly 150 characters per line, even a few hundred entries fit in a few thousand tokens. The contract breaks the moment the index crosses 200 lines, because new save-memory calls keep appending to the file and old entries below line 200 stop being visible to Claude on session start.

The 935 lines below the cut on this machine are not a hypothetical tail. They are entries Claude wrote months ago, that Claude can no longer see, and that the next save-memory call will sit beneath. From an outcomes standpoint the index has silently rotted: the model is making decisions based on the top 200 lines, and the rotation order is purely chronological (newest at top, in this case), so the lines past the cut are the oldest reference material rather than the least relevant.

4. Per-turn token math

The same chars/4 heuristic Claude's CLI uses internally, applied to the head-200 byte count we measured. The formula is the same one our CLAUDE.md analyzer uses; the inputs are the only thing different.

cost-math.ts

back-of-envelope · auto memory bill

Read the second-to-last line first. The same prompt-cache logic that flattens CLAUDE.md cost applies here: MEMORY.md sits in the same cached prefix, so a stable index drops the recurring per-turn cost by roughly 10x. The fastest way to bust the cache for both files at once is an ISO date or a "today is" string in the first 20 lines of either, which is the highest-impact finding our CLAUDE.md analyzer catches and the one the context-burn audit page walks through end-to-end.

5. How this differs from CLAUDE.md

The two files share a session, a cache prefix, and a turn count, but they are not equivalent. Here is the shape, side by side:

Feature	CLAUDE.md	MEMORY.md
What fires every turn	CLAUDE.md, every byte, in the system prompt	MEMORY.md first 200 lines, in the system prompt
What truncates silently	nothing; CLAUDE.md ships in full	lines 201+ never load, but stay on disk and keep growing on every save
Follow-read behavior	none; CLAUDE.md is one file	Claude reads per-topic files it judges relevant via the [[name]] links
Owner	you write it	Claude writes it; you can edit or delete entries with /memory
Growth pattern	human edits, slow drift	Claude appends new lines on its own when it thinks something is worth keeping
Default state	off until you create CLAUDE.md	on by default since the feature shipped

The practical upshot: if you have been auditing only CLAUDE.md you have been counting roughly half of your recurring per-turn config cost. The half you missed is the one Claude wrote for itself, and the one that grows without you doing anything.

6. A two-rule pruning pass

What we do on our own machines, manually. Neither rule is automated yet.

MEMORY.md never goes past 150 lines. A 50-line buffer below the 200 cut so the next save-memory call does not silently drop entries beneath the cut. When the index crosses 150, the lowest-signal entries (anything not touched in 60 days, anything Claude has not followed via [[name]] in the same window) get demoted from the index, or deleted outright if the body file is also unreferenced.
Global memory stays small; project memory carries the load. Anything that only matters in one repo lives in the per-project memory directory (~/.claude/projects/<slug>/memory/), where it only fires on turns inside that repo. The global index (~/.claude/projects/-Users-<you>/memory/) holds only entries that are genuinely cross-project: keychain patterns, default cards, account identities, things you actually want re-injected on every turn of every session.

Both rules are pruning rules, not size targets. The point is not to shrink the index for its own sake; it is to keep every line in the loaded 200 doing work, and to keep stale entries from sitting beneath the cut where the file keeps growing but Claude cannot see what is there.

7. Caveats on the measurement

A few things this analysis is not.

It is not a tokenizer-accurate count. The chars/4 heuristic is the same one the CLI uses for the on-screen estimate; the tokenizer-exact count is a few percent off either direction. For decisions about a 5,420-token line item that is fine.
It is not a cache-aware count. The per-turn 5,420 figure assumes a clean cache. In steady-state usage the cached prefix drops the effective per-turn cost by roughly 10x. The figure is the worst-case ceiling, not the typical bill.
It is not a measurement of the per-topic follow-read cost. That is a session-level cost that depends on which entries the model judges relevant, which is task-specific. We measured 83 files; in practice Claude reads two to five of them on a typical task and zero on many.
It is one machine. Your global MEMORY.md may be 40 lines or 2,000. The shape of the truncation behavior is the same; the specific dollar number is not.

Want a second pair of eyes on your config?

Bring your CLAUDE.md, AGENTS.md, and a head -200 of MEMORY.md. We will read all three together and pick the three cuts with the largest per-turn impact.

Frequently asked questions

How many tokens does auto memory cost per turn?

It depends on the size of your MEMORY.md, but only the first 200 lines load. On a long-lived global memory directory we measured live, 200 lines is 21,677 characters, which the same chars/4 heuristic Claude's CLI uses translates to about 5,420 tokens. Those 5,420 tokens fire on every turn for the whole session, the same way CLAUDE.md does. A 30-turn session at the Opus 4.7 input rate ($15 per million as of 2026-05-18) is about $2.44 of input cost from MEMORY.md alone, on top of CLAUDE.md. With a clean cache prefix, the cache-hit rate is closer to a tenth of that.

What is the 200-line truncation and why does it exist?

Anthropic's auto-memory loader only injects the first 200 lines of MEMORY.md into the session context. Lines past that are not read on session start. The implicit contract is: MEMORY.md is an index, not the memory itself. The actual memory bodies live in the sibling .md files; MEMORY.md is one line per memory at roughly 150 characters. If your index has grown past 200 lines, the tail is dead weight on disk that future save-memory calls will keep appending to, but Claude will never see those entries again unless you manually re-prune the index.

Do the per-topic memory files (the ones MEMORY.md points to) also fire every turn?

No. The bodies under ~/.claude/projects/<slug>/memory/<name>.md do not auto-load. Claude reads them on demand: when it judges an index entry relevant to the current task, it opens the linked file as a tool call. That spends tool tokens (the file body, in full) but only once for that session and only for files Claude actually opens. The per-turn cost is the index. The per-session cost is whatever subset of files the model judged relevant.

What does a real, year-old MEMORY.md look like on disk?

On the machine we measured, ~/.claude/projects/-Users-matthewdi/memory/MEMORY.md is 1,135 lines, 147,146 characters total. The first 200 lines (the only part that loads) are 21,677 chars. The 935 lines below that are 125,469 chars sitting unread. There are 83 sibling topic files in the same directory. The same machine has a smaller per-project memory at ~/.claude/projects/-Users-matthewdi-social-autoposter-website/memory/, 375 lines, with three topic files totalling 2,540 chars. Most consumer machines are somewhere between the two.

How is this different from CLAUDE.md token cost?

CLAUDE.md is a contract you wrote and ships in full. Every byte fires on every turn. MEMORY.md is an index Claude wrote and gets capped at 200 lines, with the per-topic bodies pulled on demand. Both bill against the same prompt-cache prefix, so cache-busting one busts the cache for both. From a token-budget standpoint, MEMORY.md is roughly the smaller of the two on a clean machine and roughly equal on a long-lived machine where it has been growing for months without pruning.

Can I disable auto memory if the token cost bothers me?

Yes. Run /memory in the CLI and use the auto-memory toggle, or delete the per-project memory directory under ~/.claude/projects/<slug>/memory/. Both stop new memories from being written and stop the index from loading. We do not recommend deleting it blind: read the index, keep the entries that still match how you work, and either prune the rest or move them to project-specific MEMORY.md files where they only load when you cd into that repo.

How do I see the per-project memory directory for the repo I am in right now?

The path is ~/.claude/projects/<slugified-absolute-path>/memory/, where the slug replaces every / in the absolute path with -. For /Users/you/ccmd-website the directory is ~/.claude/projects/-Users-you-ccmd-website/memory/. ls -la that directory to see MEMORY.md and the per-topic .md files; head -50 MEMORY.md to see the lines that fire on every turn; wc -l MEMORY.md to know whether you are past the 200-line cut.

Does ccmd's analyzer score MEMORY.md the same way it scores CLAUDE.md?

Today the analyzer is purpose-built for CLAUDE.md, AGENTS.md, .cursorrules, and .grokrules: the four files Claude or its peers concatenate into the system prompt verbatim. MEMORY.md is shaped differently (one line per memory, plus a directory of bodies Claude follows on demand), so the seven CLAUDE.md finding kinds map only partially. The shared checks (bloat, cache_bust, duplicate, missing_why) still help because Claude reads the index every turn just like the rest of the system prompt. The 200-line cut is the memory-specific check we plan to add next.

What practical pruning rule do you use yourself?

Two rules. First, MEMORY.md never goes past 150 lines (a 50-line buffer below the 200 cut so saves do not silently drop). Second, the entry for any memory you have not touched in 60 days gets demoted from the index to the per-topic file only, or deleted entirely if Claude has not followed the [[name]] link in that window. Both are manual passes; we do not have an automated pruner shipped yet.