/alternative · comparison · claude code

Skills vs MCP servers vs CLI: same budget, three different bills.

M
Matthew Diakonov
9 min read

Most discussions of these three treat them as alternative products to install. They are not. They are three slots inside the same context window, and each one bills differently. A skill that never fires still costs you tokens every turn. An MCP server you barely use costs you 5 to 10 times more than the entire skill catalog, every turn. A CLI command costs nothing until you actually run it, then potentially a lot. This page walks the math on all three and gives a rule for picking.

1. The three cost shapes, side by side

The fundamental question is when the tokens land. There are two timings: ambient (every turn, whether you use the thing or not) and per-invocation (only when you call it). Each mechanism picks differently.

DimensionSkillsMCP serversCLI / Bash
Ambient cost (every turn)75-150 tok per skill, capped at 1% of context (~2,000 on 200K)200-600 tok per tool, no default cap, easily 5-15KOne tool schema, fixed ~200 tok total
Per-invocation costFull SKILL.md body, 1-8K tokStructured request + structured response payloadCommand string in, stdout/stderr out (can be large)
DiscoveryAmbient: Claude reads listing, decides when to invokeAmbient: Claude sees every tool, decides when to callYou have to mention it, or it lives in CLAUDE.md
Best fitProcedure, checklist, writing convention, style guideTyped I/O, stateful resource, API/database round-tripShell commands, build/test/grep, ad hoc scripting
State across callsNo, body re-enters context each invocationYes, server process can hold connections/tokensNo, fresh shell each call
What ccmd scoresSKILL.md body and description, same Karpathy-12 rubricServer README and tool description textBash invocations in CLAUDE.md/hooks (the rule lines)

2. What an actual session looks like

A realistic Claude Code session for a mid-size project: 12 installed skills, 4 MCP servers (a database tool, GitHub, Linear, and a custom data-pipeline server), one global CLAUDE.md plus one project CLAUDE.md totalling 6,400 tokens. Before you type anything, your context already looks like this:

ambient context, before turn 1

That subtotal (about 23K tokens) is what every turn ships before the model reads a single line of your code. The single biggest chunk is the MCP tool listings at 8,930 tokens. The 12 skills account for 1,540 tokens because the skill listing is capped by the skillListingBudgetFraction default of 0.01 (~2,000 tokens on a 200K context). MCP has no equivalent cap. The mechanism with the worst defaults is also the one most people install most of.

5.8x

MCP tool listings ship in the system prompt on every turn. Their JSON schema with descriptions is 3-5x larger per entry than the equivalent skill description.

Anthropic Model Context Protocol spec

0 tokmedian per-skill listing entry
0 tokmedian per-MCP-tool listing entry
0 toktotal CLI/Bash tool schema (fixed)
0 tok4 MCP servers ambient, real session

3. Why a single MCP tool costs more than ten skills

A skill listing entry is a name plus a one-sentence description. An MCP tool listing entry has to include every parameter, its type, its description, defaults, and which ones are required, so the model knows how to call it without a round trip. The schema is what makes MCP reliable, and also what makes it heavy.

Here is a realistic single tool from a GitHub MCP server. Count the words. Eight properties, each with a description, plus a top-level description that has to explain the contract:

github-mcp · tool definition · 1 of 24

That one tool definition is about 510 tokens. The same GitHub MCP server typically exposes 20 to 30 tools. Even at the median 450 tokens per tool, a single MCP server is carrying 9-13K tokens of ambient context cost. A typical skill description for the equivalent procedural advice ("use the gh CLI to open PRs with a body argument") is closer to 120 tokens.

This is not an argument against MCP. Structured I/O is genuinely better than parsing shell output for some tasks. It is an argument against treating MCP installs as free. Every server you add is a permanent tax on every turn of every session.

4. The same rubric scores all three

The token-cost math is one piece. The other piece is that the prose you write inside any of these mechanisms (the SKILL.md body, the MCP tool description, the rule lines in CLAUDE.md that direct Bash usage) all share the same failure modes: bloated lines, vague absolutes, missing-why prohibitions, cache-busting timestamps. ccmd's analyzer at src/lib/analyzer.ts:41 inspects only the first 300 characters of any paste to pick a label, then runs the same Karpathy-12 rubric and 7-finding pipeline against the body. Same scorer, three input shapes:

ccmd · report on a SKILL.md body

The findings (bloat, vague, aspirational, missing_why, cache_bust, duplicate, conflict) are language-agnostic. The 28-word bloat threshold at src/lib/analyzer.ts:150 applies to a tool description in an MCP server's metadata exactly the way it applies to a rule in CLAUDE.md. A 50-word tool description is 50 words the model has to read on every turn, and Anthropic's own field results show rule lines beyond 28 words start losing the second clause.

So the workflow is the same for all three: paste the body into ccmd.dev, read the findings, cut the lines flagged at high severity, re-paste. The chars/4 heuristic at src/lib/analyzer.ts:37 gives you the token count, the rubric gives you the structural gaps, and the findings give you the line-by-line cuts.

5. A decision rule that does not cost you a weekly limit

Three questions, in order. The first one usually answers it.

  • Does Claude need to round-trip structured data? If yes (you want a typed result back, you want to chain calls, you need a stateful connection), MCP is the answer. The ambient cost is real, but the alternative is asking Claude to parse free-form shell output, which is unreliable past about three fields.
  • Is the value mostly instruction the model needs to follow? If yes (a procedure, a style guide, a checklist, a code-review heuristic), it is a skill. Anthropic's skills system is specifically built for this shape: short description in the listing, full body only when matched.
  • Is it a one-off shell command or script? Default to CLI. You give up ambient discoverability, but you get the cheapest possible mechanism. If the same command keeps showing up in your prompts week after week, that is the signal to promote it to a skill.

6. The 10-minute three-surface audit

1

Count what is ambient right now

Run /skills to see installed skills. Run /mcp (or check ~/.claude/mcp_settings.json) to see configured MCP servers. Run ls .claude/hooks/ to see scripted CLI hooks. Three lists, three numbers. Most teams have not looked at all three together since onboarding.

2

Cut the unused MCP servers first (biggest lever)

MCP listings are the heaviest ambient slot, with no default cap. Disable any MCP server you have not invoked in 30 days. Cutting one mid-size MCP server typically saves 4-8K tokens per turn, more than disabling every unused skill combined. Keep the server installed; just toggle it off in mcp_settings.json.

3

Cut unused skills next, then relocate

Run /skills, disable everything not invoked in 30 days, then move project-only skills from ~/.claude/skills/ into <repo>/.claude/skills/. The listing budget drops from session-wide cost to repo-scoped cost. Survivors get pasted into ccmd.dev to score their body.

4

Score every surviving body line by line

For each surviving SKILL.md, each MCP server's tool descriptions block, and the CLAUDE.md sections that direct CLI usage, paste the body into ccmd.dev. The analyzer flags bloated lines (over 28 words), vague absolutes, missing-why prohibitions, and cache-busting timestamps. Cut the high-severity findings first; they compound across every turn.

5

Optional: continuous monitoring on PR diff

The free analyzer is one-shot. If your SKILL.md files, MCP configs, or CLAUDE.md sections are checked into the repo and edited by multiple engineers, ccmd's paid tier comments the score on every PR that touches them, so drift is caught at review time instead of on a Wednesday rate-limit hit.

7. What "cut" looks like

A typical mid-session project: 12 skills, 4 MCP servers, CLAUDE.md at 6,400 tokens. After one 10-minute audit:

Same project, after the 10-minute audit

12 skills, half never invoked. 4 MCP servers including one (Linear) that the team disabled in product but left installed in Claude. CLAUDE.md with a duplicate set of style rules that also live in two of the skills. Listing budget over by 57%, MCP ambient dominant at 8,930 tokens.

  • 12 skills, 7 dead
  • 4 MCP servers, 1 fully unused
  • 8,930 tok of MCP ambient
  • CLAUDE.md duplicates skill rules
  • Wednesday rate-limit hits frequent

Notice where the cut came from: roughly 4.7K tokens from MCP, 800 tokens from skills, and 2K from de-duplicating CLAUDE.md against the surviving skills. The pattern holds across most audits we run: MCP is the biggest single line, but the leveraged cuts come from de-duplicating prose between CLAUDE.md, skill bodies, and MCP tool descriptions. The same sentence is often in all three.

Want a 15-minute pass on your three surfaces?

Free. Bring your /skills output, your mcp_settings.json, and your top CLAUDE.md. We will tell you which to disable, which to relocate, and which to rewrite first.

Frequently asked questions

What is the actual difference between a skill, an MCP server, and a CLI tool in Claude Code?

A skill is a markdown file (SKILL.md) plus optional scripts. Claude reads its name and description on every turn, and pulls the body into context only when the description matches the request. An MCP server is a separate process speaking the Model Context Protocol; it exposes a set of named tools, and every tool's name, description, and JSON input schema ships into context every turn whether you use it or not. CLI (specifically the built-in Bash tool, or any executable you invoke through it) is a single tool that takes a string command and returns stdout, stderr, and exit code. Bash ships its one schema once per turn; nothing else.

Which one is most expensive in tokens?

Per ambient slot, MCP servers are usually the worst offender. A skill listing entry is around 75 to 150 tokens. An MCP tool listing entry is 200 to 600 tokens because the JSON input schema adds 5 to 20 properties with descriptions. Four MCP servers exposing 20 tools each can easily run 8,000 to 12,000 tokens before you type. Skills with a 1% default listing budget cap closer to 2,000 tokens on a 200K model. CLI/Bash is the cheapest ambient: one tool, one short schema, fixed cost. The expensive part of CLI is the command output, which depends entirely on what you run.

If MCP is so heavy, why use it?

Two reasons. First, structured I/O: MCP gives Claude a typed input schema and a structured response, which is much more reliable than parsing free-form Bash output for anything that needs to round-trip data (database rows, API responses, file uploads). Second, stateful tool surfaces: an MCP server can hold a database connection, an OAuth token, or an open file handle across many tool calls. Bash starts a fresh shell each invocation. Use MCP when correctness depends on schema; use CLI when you would have written a shell command anyway.

When should I prefer a skill over an MCP server?

Skills are good when the value is prose: a procedure, a checklist, a writing convention, a style guide. Anything that boils down to 'Claude, here is how to think about X before you do it' belongs in a skill. MCP servers are bad at this because they have to express it as a tool, which forces structure where you wanted instruction. The flip side: any time you want Claude to call out to a real system (a database, an API, a filesystem with structured paths), it belongs in MCP, not a skill.

Can I use the CLI for everything and skip both?

For solo workflows, often yes, and many engineers do exactly this. Bash plus a few project scripts in scripts/ covers the 80% case. You give up two things: ambient discoverability (Claude does not 'know' about your scripts until you mention them; skills and MCP tools advertise themselves), and structured output. If your workflow tolerates Claude reading shell output and inferring structure, CLI is the cheapest path. If you want Claude to reliably pick the right tool without prompting, you pay the ambient cost for a skill or MCP listing.

How does ccmd help across all three?

ccmd's analyzer at src/lib/analyzer.ts:41 only inspects the first 300 characters of any paste to pick a file-type label, then runs the same Karpathy-12 rubric and 7-finding pipeline against the body. That means the same scorer that grades a CLAUDE.md grades a SKILL.md body, an MCP server's README, an AGENTS.md (Codex), .cursorrules (Cursor), or .grokrules (Grok Build). The findings (bloat, vague, aspirational, missing_why, cache_bust, duplicate, conflict) cost tokens identically in all of them. The math at src/lib/analyzer.ts:37 uses the same chars/4 heuristic every CLI uses.

What is the rule of thumb for picking?

If the value is instruction (how to think), make it a skill. If the value is a tool call with typed input and output (run a query, hit an API, upload a file), make it an MCP server. If the value is a one-off shell command (run tests, build, grep, scp), leave it as CLI. Default to CLI; promote to skill when the same instruction keeps appearing in your prompts; promote to MCP when you need structured I/O or state across calls.

Where does this break down at scale?

Two failure modes. (1) Installing every MCP server you can find: ambient cost climbs past 15,000 tokens before the model reads your code, and after about 50% context fill the model starts dropping instructions silently. (2) Treating skills as documentation: a 6,000-token SKILL.md that fires 30 times in a long session is 180,000 input tokens of the same drift, which is most of an Opus 4.7 session's input cost wasted on prose that the analyzer would have flagged in 30 seconds.

Next: the two-surface skills-bloat walkthrough, the Karpathy-12 rubric in full, or paste your SKILL.md, MCP README, or CLAUDE.md into the analyzer.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.