2 months ago 2 months ago

Tutorial: Caveman Skill for Claude Code Token Savings

The Caveman skill for Claude Code forces stripped-down prose responses, cutting per-task output tokens by up to 65% — though real session-wide savings land closer to 4–5%. A 2026 arXiv study backs the approach: brevity constraints improved accuracy by 26 percentage points across 31 open-weight models, suggesting conciseness does more than save tokens.

by marketingagent.io 2 months ago2 months ago

239views

The Caveman Skill for Claude Code: Token Savings, Accuracy Gains, and the Science Behind It

A GitHub repo with 5,400 stars that forces Claude Code to speak in stripped-down fragments sounds like a joke — until you read the research paper buried inside it. After working through this tutorial, you’ll understand exactly how the Caveman skill works, what it actually saves in a real session (the headline numbers mislead), and why a March 2026 arXiv paper suggests brevity constraints can improve LLM accuracy, not just cut spend.

Caveman mode cuts Claude's output from 69 tokens to 19 tokens for the same response — same info, 73% fewer tokens. — Caveman mode cuts Claude’s output from 69 tokens to 19 tokens for the same response — same info, 73% fewer tokens.

Understand what the Caveman skill does. The repo strips filler from Claude Code’s prose responses — the conversational text you read in the terminal. It doesn’t touch code generation, tool calls, or internal reasoning. The before/after comparison in the README shows a 69-token response dropping to 19 tokens in caveman mode: same technical content, no padding.

The Caveman GitHub repo README — install instructions, benchmarks, before/after examples, and intensity levels all linked from the top nav.

Read the benchmark table with appropriate skepticism. The repo benchmarks 11 real dev tasks — explaining a React re-render bug, summarizing a diff, and similar work — showing token counts for normal vs. caveman output. Savings range from 22% to 87% per task, averaging around 65%. Those numbers are accurate for that specific slice of output, which is not the whole story.

Across 11 real dev tasks, Caveman Claude averages 294 tokens versus 1,214 for normal mode — a 65% reduction with no accuracy loss claimed.

Calibrate what the savings mean across a full session. Prose responses represent roughly 6,000 tokens in a 100,000-token session. Caveman compresses that slice by ~65%, saving around 4,000 tokens. The real session-wide impact lands at approximately 4–5% — not the 75–87% the benchmark table implies. That’s still meaningful at high usage volumes, but go in with accurate expectations.

Where your tokens actually go in a 100K session: prose responses are only 6,000 tokens — the only slice Caveman compresses.

The honest math: Caveman saves ~65% of prose tokens, but prose is only 6% of a typical session — real session-wide savings land around 4%.

Note the companion memory-compression tool. A secondary tool in the repo rewrites your CLAUDE.md into caveman-speak, with a stated claim of 45% input token reduction. Apply the same logic from step 3: CLAUDE.md is a small fraction of total input tokens, so real session-wide savings are again incremental — roughly 1,000–2,000 tokens per session at scale.
Install the skill. Installation is a single command. Once added, the skill is available immediately in any Claude Code session.
Invoke caveman mode. Trigger it with /caveman, or use natural-language phrases like “talk like a caveman” or “less tokens please.” Three intensity levels are available: Lite drops filler words, Full switches to fragment-style responses, and Ultra compresses everything telegraphically.

Caveman's three intensity levels: Lite drops filler words, Full activates fragment-style responses, Ultra compresses everything telegraphically — each triggered by a simple slash command. — Caveman’s three intensity levels: Lite drops filler words, Full activates fragment-style responses, Ultra compresses everything telegraphically — each triggered by a simple slash command.

Know what caveman mode leaves untouched. Code generation, tool calls, reasoning chains, and exact error message quoting are unaffected. The skill operates only on the prose layer — explanatory text, not functional output.
Read the research backing. A March 2026 arXiv paper, Brevity Constraints Reverse Performance Hierarchies in Language Models, evaluated 31 open-weight models across 1,485 problems. Brevity constraints improved accuracy by 26 percentage points and reduced performance gaps between large and small models by up to two-thirds. The proposed mechanism — “spontaneous scale-dependent verbosity” — holds that large models over-elaborate and reason themselves into wrong answers, a phenomenon the paper calls overthinking. The study covers open-weight models, not frontier APIs like Claude, so the magnitude of the effect on commercial models remains an open question.
Use the minimal alternative if you prefer not to install the skill. A single line added to your CLAUDE.md — “be concise, no filler, straight to the point” — captures much of the same directional benefit without the skill dependency.

How does this compare to the official docs?

The video makes a strong case on both the token math and the research paper, but the install command, exact slash-command syntax, and Claude Code’s current skill registration behavior are worth verifying against Anthropic’s official documentation before you ship this into a production workflow.

Here’s What the Official Docs Show

The video covers a third-party GitHub tool, so most of its install and benchmark claims sit outside Anthropic’s official documentation — that’s expected, not a gap to worry about. Where the docs do speak directly to this tutorial is on the two mechanisms the Caveman skill depends on: how skills surface in context and how CLAUDE.md persistence works, and those two touchpoints have detail worth getting right.

Step 1 — What the Caveman skill does

No official documentation was found for this step —
proceed using the video’s approach and verify independently.

Step 2 — The benchmark table

No official documentation was found for this step —
proceed using the video’s approach and verify independently.

Step 3 — Session-wide token math

No official documentation was found for this step —
proceed using the video’s approach and verify independently.

📄 The claude.ai pricing page showing Free, Pro ($17/month annual), and Max (from $100/month) subscription tiers — useful context for weighing token-reduction ROI at different usage levels

One relevant anchor: Max plan offers 5–20× more usage than Pro. The practical value of any 4–5% session saving scales with how close to your ceiling you run.

Step 4 — The companion memory-compression tool

The video’s approach here matches the current docs exactly. CLAUDE.md is an officially documented, user-written instruction file loaded at the start of every session — compressing it does reduce per-session input tokens.

The official 'How Claude remembers your project' page introducing CLAUDE.md and auto memory as Claude Code's two session-persistence mechanisms — 📄 The official ‘How Claude remembers your project’ page introducing CLAUDE.md and auto memory as Claude Code’s two session-persistence mechanisms

One thing the video doesn’t address: auto memory is a second, parallel persistence channel — Claude-written notes derived from corrections and preferences, capped at 200 lines or 25 KB per session. The memory-compression tool targets CLAUDE.md only. Verbose auto memory runs untouched.

📄 The CLAUDE.md vs. auto memory comparison table from the official docs, detailing author, contents, scope, and load timing for each mechanism

Steps 5–8 — Install, invocation, what stays untouched, and the research paper

No official documentation was found for these steps —
proceed using the video’s approach and verify independently.

The SKILL.md entry visible in the Claude Code interface is consistent with how skills are loaded as named markdown files — but the specific install command, slash-command syntax, and the arXiv benchmark claims are not covered in Anthropic’s official docs.

The claude.ai interface preview showing 'SKILL.md' listed as a context file in the right-side panel — consistent with how Claude Code skills are loaded into context — 📄 The claude.ai interface preview showing ‘SKILL.md’ listed as a context file in the right-side panel — consistent with how Claude Code skills are loaded into context

Step 9 — Adding a conciseness directive to CLAUDE.md

The video’s approach here matches the current docs exactly. A directive like “be concise, no filler, straight to the point” placed in CLAUDE.md is a legitimate, documented way to shape Claude’s response style across sessions.

📄 The CLAUDE.md file location and scope table showing project-level, user-level, and local instruction scopes with their respective file paths and access contexts

One scope distinction the video skips: there are three separate CLAUDE.md files. Placing your directive in ./CLAUDE.md limits it to the current project. Place it in ~/.claude/CLAUDE.md and it applies across every project you work in. The local variant, ./CLAUDE.local.md, is gitignored — right for personal preferences you don’t want committed. As of April 7, 2026, the choice between project-level and user-level scope meaningfully changes how broadly the conciseness constraint applies.

Useful Links

How Claude remembers your project – Claude Code Docs — Official documentation covering all three CLAUDE.md scopes, file paths, and the auto memory mechanism including its 200-line / 25 KB session load cap
Claude Code — The claude.ai Claude Code landing page, including the pricing tiers relevant for evaluating token-reduction tools at different usage volumes
Claude Code overview – Claude Code Docs — The intended Claude Code documentation overview; note that screenshots labeled with this URL in our research were sourced from the claude.ai marketing site rather than the docs themselves