/t · guide · latency

Claude Code feels slow? Config bloat is a latency tax, not just a billing one.

Matthew Diakonov, Written with AI

Published May 18, 20265 min read

Most write-ups about a fat CLAUDE.md talk about cost. The weekly cap, the Opus rate, the token bill. That is real. But the symptom most people feel first is not the bill at the end of the week, it is the two-second pause at the start of every turn. That is the latency tax. It comes from the same bytes that cost you money, on a different axis.

This page walks through where the latency lives in a CLAUDE.md, which ccmd.dev analyzer findings actually map to it, and the order to fix them in. I wrote it because every existing guide on the topic ends with "cut your CLAUDE.md", which is true but treats config bloat as a billing problem. If you are reading this on your phone after hitting a slow turn, you are here because of latency, not the bill.

1. Where the latency lives

A single turn of Claude Code has roughly five places time goes:

Network upload of the request. The full system prompt (CLAUDE.md + skills + tools + history) travels over the wire on every call. A 6,000-token CLAUDE.md is about 24KB of plaintext on its own.
Prompt prefix lookup. Anthropic checks whether the prefix of this request matches a cached prefix from a previous request. Byte-exact match required.
Input processing (time-to-first-token). On a cache hit the cached prefix is replayed cheaply. On a cache miss the full input has to be processed before the model can emit its first character.
Generation. Output tokens stream back. Roughly linear with the length of the answer, mostly independent of CLAUDE.md size.
Network download. Same shape as upload but smaller.

CLAUDE.md bloat hits the first three. It cannot make generation faster and it cannot make the download faster. What it can do is double or triple time-to-first-token by forcing a cache miss, and add a steady tax to network upload on every single turn for the rest of the session.

2. The two analyzer findings that are also latency findings

Our analyzer at src/lib/analyzer.ts has seven finding kinds. Five of them are quality findings (vague, aspirational, conflict, duplicate, missing_why). Two of them are structural and they double as the latency findings: cache_bust and bloat. Here are the actual checks, lifted verbatim from the analyzer.

src/lib/analyzer.ts

The cache_bust check is the one that matters most for latency. The regex /\b20[2-9]\d-\d{2}-\d{2}|today|this session|right now\b/i runs against every one of the first 20 lines of the file. A single hit means every new session writes a different string at that position, which means the cached prefix never matches, which means every turn pays the uncached time-to-first-token path. One line.

The bloat check fires on any non-blank line over 28 words. The analyzer's suggested fix is to split into 2-3 shorter directives. The published token saving on the fix is 35% of the line. The latency saving rides with it: 35% fewer bytes uploaded for that line, on every single turn, for the life of the session.

3. What an analyzer pass on a real file looks like

A 6,042-token CLAUDE.md from a Next.js payments repo. Four bloated lines and one cache_bust. The latency view of the report:

ccmd · latency view

The first block is the bytes you pay for every turn. The second is where the time-to-first-token gap is hiding. Fix L3 and every subsequent session hits the cached path; fix L15, L42, L73, L88 and the cached path itself gets shorter.

4. The numbers that move when you fix this

0total tokens before fix

0lines over 28 words

0cache_bust on row 3

0%token cut per bloat fix

Two of these numbers move latency directly. The cache_bust count going from 1 to 0 moves every turn in every new session from the uncached TTFT path to the cached one. The bloat-line count going from 4 to 0 shrinks the bytes uploaded on every turn by a few hundred tokens, which compounds across a 30-turn session.

5. Before and after on the same file

Top of CLAUDE.md, fat vs. lean

Lines 1-6, the way most CLAUDE.md files start. One date on row 3 (cache_bust), one 31-word stack paragraph on row 5 (bloat). Every new session pays the uncached TTFT path. Every turn ships the full 31-word paragraph.

row 3: 'Today is 2026-05-18' rewrites every session
row 5: 31-word stack paragraph; second half gets ignored anyway
cached prefix mismatch on every new session
uncached time-to-first-token on every turn for the session

6. Why the cache_bust line ranks high-severity

The seven analyzer finding kinds are sorted by blast radius, not by line number. cache_bust is the only kind that flips a regime; it changes which TTFT path your entire session runs on. Every other kind shaves bytes off a single line.

high severity

“Timestamp or session-specific text near the top busts prompt cache on every session.”

src/lib/analyzer.ts:199 (the cache_bust message string)

The message text is the analyzer's own. The regex is the one shown in section 2. The check is ten lines of TypeScript and it is the highest-leverage single fix in the entire rubric for anyone who cares more about Claude Code feeling fast than about the weekly dollar number.

7. The order to fix in, if you only have ten minutes

Delete every cache_bust hit. Anything matching YYYY-MM-DD, today, this session, or right now in the first 20 lines either goes to the bottom of the file or out of the file entirely. One delete, every future session hits the cached TTFT path.
Split every line over 28 words. One 34-word stack paragraph turns into three 8-word directives. The second half of long lines was being ignored anyway; you keep the meaning and stop paying to upload bytes the model is skipping.
Pin the first 20 lines as byte-stable. Treat the top of the file as the cache key. Project name, language, framework, package manager. Nothing that changes between sessions. Anything that does change (sprint, ticket, focus) lives lower or in a separate file the agent reads on demand.
Move volatile context to a tool, not a rule. Today's date belongs in a tool call, not in the system prompt. The agent can ask for it when it needs it. The cache stays warm.

The whole sequence runs in roughly ten minutes on a typical file. You will feel the difference on the next session before the bill confirms it.

Want us to run the latency pass on your CLAUDE.md live?

15 minutes, paste your file, walk through the cache_bust and bloat findings together, leave with the rewritten first 20 lines. Free.

Frequently asked questions

Does CLAUDE.md bloat actually slow down Claude Code, or is it only a billing issue?

Both. Every line of CLAUDE.md is concatenated into the system prompt and re-sent on every turn of the session. That has two felt symptoms. The dollar cost on the weekly bill, and the wall-clock latency on every single response. The latency tax has two parts: bytes have to upload over the network before the API call starts, and the input window has to be processed before the model can emit the first token. The bigger the file, the longer both take.

Why does one ISO date in the first 20 lines matter so much for speed?

Anthropic's prompt caching is byte-exact. The cached prefix has to match the new request character-for-character to count as a hit. Our analyzer at src/lib/analyzer.ts line 194 fires on /\b20[2-9]\d-\d{2}-\d{2}|today|this session|right now\b/ in the first 20 lines. A line like 'Today is 2026-05-18' rewrites itself every new session. Each new date is a new prefix. Each new prefix means the cache misses and the request takes the uncached time-to-first-token path. Anthropic's prompt-caching docs describe cached reads as substantially faster than uncached ones; the exact delta depends on the model and the prefix length but a 6,000-token file is squarely in the range where cache hits feel like a different product than cache misses.

Does prompt caching help with latency or only cost?

Both, and the latency win is what most people feel first. Anthropic documents prompt caching as reducing both input cost and time-to-first-token for the cached prefix. The cost reduction is published as ~10x for cached reads. The latency reduction is qualitative in the docs but consistent in our and other users' measurements: a 6,000-token cached prefix hits much sooner than the same 6,000 tokens uncached. The cache key is the byte content of the prefix, which means a date-in-first-20-lines wrecks both wins simultaneously.

Are bloated lines (28+ words) a latency problem or just a quality problem?

Both, again. A 34-word stack paragraph is more bytes uploaded on every turn and more tokens fed through the input window before the model speaks. The quality problem is well known: the second half of lines over roughly 25 words gets ignored by the model. The latency problem is structural: those ignored bytes still have to travel and still have to be processed. Splitting a 34-word paragraph into three 8-word directives cuts roughly 35% of that line's tokens (the analyzer's published bloat saving) and turns three ignored bytes into three followed rules.

Which finding kind has the biggest latency blast radius?

cache_bust, by a wide margin. The math: a bloat fix saves you 35% of one line's tokens, every turn. A cache_bust fix moves every turn from the uncached TTFT path to the cached TTFT path. The first is linear with the line size; the second is a regime change for the entire session. The analyzer ranks cache_bust as high severity for exactly this reason.

How do I measure the latency tax on my own CLAUDE.md without setting up tracing?

Paste your file into the textarea on ccmd.dev. The analyzer reports totalTokens, totalChars (the bytes that ride on every turn), the per-line bloat count, and any cache_bust findings. All of it runs in your browser; nothing uploads. For end-to-end timing in your actual sessions, the cheapest signal is the gap between hitting Enter and the first character of the model's response. Compare a session with a date-in-first-20-lines to one without and you will feel the difference before any tool tells you to.

Does removing CLAUDE.md entirely make Claude Code faster?

Faster, yes. Useful, no. A blank CLAUDE.md zeroes the per-turn upload and removes the cache_bust risk, but it also strips the agent of the constraints that make it follow your stack, your tests, and your past incidents. The goal is not zero bytes. The goal is to keep every line doing work the agent will follow and to keep the file's first 20 lines byte-stable across sessions. The analyzer is a way to spot the lines that fail one or both of those tests.

What about AGENTS.md and .cursorrules; same latency story?

Same story. The analyzer detects all four formats (CLAUDE.md, AGENTS.md, .cursorrules, .grokrules) by content rather than filename and applies the same bloat threshold and cache_bust regex to all of them. Codex (AGENTS.md), Cursor (.cursorrules), and Grok Build (.grokrules) all concatenate their config into a system prompt that rides on every turn. The latency tax does not care which CLI ships the file.