caveman

Claude Code skill that rewrites prompts in caveman speech to cut 65% of tokens with no measurable quality loss. Joke premise, real savings.

pythonrepo ↗

claude-codeclaude-skillsprompt-engineering

A joke premise, a serious result. Caveman is a Claude Code skill (and Codex/Gemini/Cursor/Windsurf/Cline/Copilot plugin) that rewrites the agent’s prompts in caveman-speech and cuts roughly 65% of output tokens with no measurable accuracy loss. The README’s tagline (“why use many token when few token do trick”) is also a fair summary of the engineering claim.

Two reasons it earns the 9k+ stars: the savings are reproducible from the project’s own eval harness, and the implementation is one-command install for nearly every major coding agent. The caveman thing is a Trojan horse for terse-prompt research that would be ignored if it shipped under a less fun name.

What it does, in two examples

Normal Claude (69 tokens):

“The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle. When you pass an inline object as a prop, React’s shallow comparison sees it as a different object every time, which triggers a re-render. I’d recommend using useMemo to memoize the object.”

Caveman Claude (19 tokens):

“New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.”

Same fix. ~75% fewer tokens. Brain still big.

Intensity levels

Pick your level of grunt:

Lite (/caveman lite) - drop filler, keep grammar. Professional, no fluff.
Full (/caveman full) - default caveman. Drop articles, fragments, full grunt.
Ultra (/caveman ultra) - maximum compression. Telegraphic. Abbreviate everything.
Wenyan (/caveman wenyan, wenyan-lite, wenyan-ultra) - classical Chinese literary compression. Same accuracy in arguably the most token-efficient written language ever.

Levels stick until you change them or the session ends.

Install

One command per agent:

Agent	Install
Claude Code	`claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman`
Codex	Clone repo → `/plugins` → search “Caveman” → Install
Gemini CLI	`gemini extensions install https://github.com/JuliusBrussee/caveman`
Cursor	`npx skills add JuliusBrussee/caveman -a cursor`
Windsurf	`npx skills add JuliusBrussee/caveman -a windsurf`
Copilot / Cline / others	`npx skills add JuliusBrussee/caveman`

For Claude Code and Gemini, auto-activation happens via SessionStart hooks and context files - install once, get caveman in every future session. For the others, npx skills add installs the skill but not the auto-activation snippet, so you trigger with /caveman, $caveman, or “talk like caveman” each session (or paste the always-on snippet from the README into your system prompt).

The benchmarks

The README publishes its own eval harness output. Real token counts from the Claude API:

Task	Normal	Caveman	Saved
Explain React re-render bug	1180	159	87%
Fix auth middleware token expiry	704	121	83%
Set up PostgreSQL connection pool	2347	380	84%
Explain git rebase vs merge	702	292	58%
Refactor callback to async/await	387	301	22%
Architecture: microservices vs monolith	446	310	30%
Review PR for security issues	678	398	41%
Docker multi-stage build	1042	290	72%
Debug PostgreSQL race condition	1200	232	81%
Implement React error boundary	3454	456	87%
Average	1214	294	65%

Range is 22%–87%, depending on how prose-heavy the response naturally is. Architecture explanations compress less; bug explanations compress more.

The important caveat the README is honest about: caveman only affects output tokens. Thinking/reasoning tokens are untouched. The biggest practical win is readability and response speed; the cost savings are a bonus.

The non-obvious feature: caveman-compress

/caveman makes the agent speak with fewer tokens. caveman-compress makes it read fewer tokens. It rewrites your CLAUDE.md (and any other context files Claude loads every session start) into caveman-speak, so the agent’s input is smaller every time it boots.

/caveman:compress CLAUDE.md

After running:

CLAUDE.md          ← compressed (Claude reads this every session, fewer tokens)
CLAUDE.original.md ← human-readable backup (you read and edit this)

The README’s measured savings on real CLAUDE.md-style files:

File	Original	Compressed	Saved
`claude-md-preferences.md`	706	285	59.6%
`project-notes.md`	1145	535	53.3%
`claude-md-project.md`	1122	636	43.3%
`todo-list.md`	627	388	38.1%
Average	898	481	46%

Code blocks, URLs, file paths, commands, headings, dates, and version numbers pass through untouched. Only prose gets compressed.

(Security note from the upstream: Snyk flags caveman-compress as High Risk due to subprocess/file patterns. It’s a false positive - the project’s SECURITY.md explains why.)

Other skills shipped in the same plugin

caveman-commit - terse commit messages. Conventional Commits, ≤50 char subject, why-over-what.
caveman-review - one-line PR comments. L42: 🔴 bug: user null. Add guard. No throat-clearing.
caveman-help - quick-reference card; lists all modes, skills, commands.

When to reach for it

Long agent sessions where output volume is a real cost driver.
Codebases with bloated CLAUDE.md / context files where session-start tokens are silently hurting you.
Anyone who finds verbose AI explanations slower to read than to skim.

When not to

Tasks where the prose itself is the deliverable (writing docs, drafting emails, customer-facing copy).
Audiences who’ll find caveman speech unprofessional. The Lite intensity is the right starting point if you’re not sure.

Why this works (the boring research version)

There’s a March 2026 paper - “Brevity Constraints Reverse Performance Hierarchies in Language Models” - that found constraining large models to brief responses improved accuracy by 26 percentage points on certain benchmarks. Verbose isn’t always better. Sometimes fewer words means more correct.

Caveman is the practical, opinionated, fun-named expression of that result. The eval harness lives in evals/ if you want to verify the numbers yourself.