Discovery

caveman

Claude Code skill that rewrites prompts in caveman speech to cut 65% of tokens with no measurable quality loss. Joke premise, real savings.

pythonrepo ↗
claude-codeclaude-skillsprompt-engineering

A joke premise, a serious result. Caveman is a Claude Code skill (and Codex/Gemini/Cursor/Windsurf/Cline/Copilot plugin) that rewrites the agent’s prompts in caveman-speech and cuts roughly 65% of output tokens with no measurable accuracy loss. The README’s tagline (“why use many token when few token do trick”) is also a fair summary of the engineering claim.

Two reasons it earns the 9k+ stars: the savings are reproducible from the project’s own eval harness, and the implementation is one-command install for nearly every major coding agent. The caveman thing is a Trojan horse for terse-prompt research that would be ignored if it shipped under a less fun name.

What it does, in two examples

Normal Claude (69 tokens):

“The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle. When you pass an inline object as a prop, React’s shallow comparison sees it as a different object every time, which triggers a re-render. I’d recommend using useMemo to memoize the object.”

Caveman Claude (19 tokens):

“New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.”

Same fix. ~75% fewer tokens. Brain still big.

Intensity levels

Pick your level of grunt:

  • Lite (/caveman lite) - drop filler, keep grammar. Professional, no fluff.
  • Full (/caveman full) - default caveman. Drop articles, fragments, full grunt.
  • Ultra (/caveman ultra) - maximum compression. Telegraphic. Abbreviate everything.
  • Wenyan (/caveman wenyan, wenyan-lite, wenyan-ultra) - classical Chinese literary compression. Same accuracy in arguably the most token-efficient written language ever.

Levels stick until you change them or the session ends.

Install

One command per agent:

Agent Install
Claude Code claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman
Codex Clone repo → /plugins → search “Caveman” → Install
Gemini CLI gemini extensions install https://github.com/JuliusBrussee/caveman
Cursor npx skills add JuliusBrussee/caveman -a cursor
Windsurf npx skills add JuliusBrussee/caveman -a windsurf
Copilot / Cline / others npx skills add JuliusBrussee/caveman

For Claude Code and Gemini, auto-activation happens via SessionStart hooks and context files - install once, get caveman in every future session. For the others, npx skills add installs the skill but not the auto-activation snippet, so you trigger with /caveman, $caveman, or “talk like caveman” each session (or paste the always-on snippet from the README into your system prompt).

The benchmarks

The README publishes its own eval harness output. Real token counts from the Claude API:

Task Normal Caveman Saved
Explain React re-render bug 1180 159 87%
Fix auth middleware token expiry 704 121 83%
Set up PostgreSQL connection pool 2347 380 84%
Explain git rebase vs merge 702 292 58%
Refactor callback to async/await 387 301 22%
Architecture: microservices vs monolith 446 310 30%
Review PR for security issues 678 398 41%
Docker multi-stage build 1042 290 72%
Debug PostgreSQL race condition 1200 232 81%
Implement React error boundary 3454 456 87%
Average 1214 294 65%

Range is 22%–87%, depending on how prose-heavy the response naturally is. Architecture explanations compress less; bug explanations compress more.

The important caveat the README is honest about: caveman only affects output tokens. Thinking/reasoning tokens are untouched. The biggest practical win is readability and response speed; the cost savings are a bonus.

The non-obvious feature: caveman-compress

/caveman makes the agent speak with fewer tokens. caveman-compress makes it read fewer tokens. It rewrites your CLAUDE.md (and any other context files Claude loads every session start) into caveman-speak, so the agent’s input is smaller every time it boots.

/caveman:compress CLAUDE.md

After running:

CLAUDE.md          ← compressed (Claude reads this every session, fewer tokens)
CLAUDE.original.md ← human-readable backup (you read and edit this)

The README’s measured savings on real CLAUDE.md-style files:

File Original Compressed Saved
claude-md-preferences.md 706 285 59.6%
project-notes.md 1145 535 53.3%
claude-md-project.md 1122 636 43.3%
todo-list.md 627 388 38.1%
Average 898 481 46%

Code blocks, URLs, file paths, commands, headings, dates, and version numbers pass through untouched. Only prose gets compressed.

(Security note from the upstream: Snyk flags caveman-compress as High Risk due to subprocess/file patterns. It’s a false positive - the project’s SECURITY.md explains why.)

Other skills shipped in the same plugin

  • caveman-commit - terse commit messages. Conventional Commits, ≤50 char subject, why-over-what.
  • caveman-review - one-line PR comments. L42: 🔴 bug: user null. Add guard. No throat-clearing.
  • caveman-help - quick-reference card; lists all modes, skills, commands.

When to reach for it

  • Long agent sessions where output volume is a real cost driver.
  • Codebases with bloated CLAUDE.md / context files where session-start tokens are silently hurting you.
  • Anyone who finds verbose AI explanations slower to read than to skim.

When not to

  • Tasks where the prose itself is the deliverable (writing docs, drafting emails, customer-facing copy).
  • Audiences who’ll find caveman speech unprofessional. The Lite intensity is the right starting point if you’re not sure.

Why this works (the boring research version)

There’s a March 2026 paper - “Brevity Constraints Reverse Performance Hierarchies in Language Models” - that found constraining large models to brief responses improved accuracy by 26 percentage points on certain benchmarks. Verbose isn’t always better. Sometimes fewer words means more correct.

Caveman is the practical, opinionated, fun-named expression of that result. The eval harness lives in evals/ if you want to verify the numbers yourself.

Similar tools