caveman
Claude Code skill that rewrites prompts in caveman speech to cut 65% of tokens with no measurable quality loss. Joke premise, real savings.
A joke premise, a serious result. Caveman is a Claude Code skill (and Codex/Gemini/Cursor/Windsurf/Cline/Copilot plugin) that rewrites the agent’s prompts in caveman-speech and cuts roughly 65% of output tokens with no measurable accuracy loss. The README’s tagline (“why use many token when few token do trick”) is also a fair summary of the engineering claim.
Two reasons it earns the 9k+ stars: the savings are reproducible from the project’s own eval harness, and the implementation is one-command install for nearly every major coding agent. The caveman thing is a Trojan horse for terse-prompt research that would be ignored if it shipped under a less fun name.
What it does, in two examples
Normal Claude (69 tokens):
“The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle. When you pass an inline object as a prop, React’s shallow comparison sees it as a different object every time, which triggers a re-render. I’d recommend using useMemo to memoize the object.”
Caveman Claude (19 tokens):
“New object ref each render. Inline object prop = new ref = re-render. Wrap in
useMemo.”
Same fix. ~75% fewer tokens. Brain still big.
Intensity levels
Pick your level of grunt:
- Lite (
/caveman lite) - drop filler, keep grammar. Professional, no fluff. - Full (
/caveman full) - default caveman. Drop articles, fragments, full grunt. - Ultra (
/caveman ultra) - maximum compression. Telegraphic. Abbreviate everything. - Wenyan (
/caveman wenyan,wenyan-lite,wenyan-ultra) - classical Chinese literary compression. Same accuracy in arguably the most token-efficient written language ever.
Levels stick until you change them or the session ends.
Install
One command per agent:
| Agent | Install |
|---|---|
| Claude Code | claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman |
| Codex | Clone repo → /plugins → search “Caveman” → Install |
| Gemini CLI | gemini extensions install https://github.com/JuliusBrussee/caveman |
| Cursor | npx skills add JuliusBrussee/caveman -a cursor |
| Windsurf | npx skills add JuliusBrussee/caveman -a windsurf |
| Copilot / Cline / others | npx skills add JuliusBrussee/caveman |
For Claude Code and Gemini, auto-activation happens via SessionStart hooks and context files - install once, get caveman in every future session. For the others, npx skills add installs the skill but not the auto-activation snippet, so you trigger with /caveman, $caveman, or “talk like caveman” each session (or paste the always-on snippet from the README into your system prompt).
The benchmarks
The README publishes its own eval harness output. Real token counts from the Claude API:
| Task | Normal | Caveman | Saved |
|---|---|---|---|
| Explain React re-render bug | 1180 | 159 | 87% |
| Fix auth middleware token expiry | 704 | 121 | 83% |
| Set up PostgreSQL connection pool | 2347 | 380 | 84% |
| Explain git rebase vs merge | 702 | 292 | 58% |
| Refactor callback to async/await | 387 | 301 | 22% |
| Architecture: microservices vs monolith | 446 | 310 | 30% |
| Review PR for security issues | 678 | 398 | 41% |
| Docker multi-stage build | 1042 | 290 | 72% |
| Debug PostgreSQL race condition | 1200 | 232 | 81% |
| Implement React error boundary | 3454 | 456 | 87% |
| Average | 1214 | 294 | 65% |
Range is 22%–87%, depending on how prose-heavy the response naturally is. Architecture explanations compress less; bug explanations compress more.
The important caveat the README is honest about: caveman only affects output tokens. Thinking/reasoning tokens are untouched. The biggest practical win is readability and response speed; the cost savings are a bonus.
The non-obvious feature: caveman-compress
/caveman makes the agent speak with fewer tokens. caveman-compress makes it read fewer tokens. It rewrites your CLAUDE.md (and any other context files Claude loads every session start) into caveman-speak, so the agent’s input is smaller every time it boots.
/caveman:compress CLAUDE.md
After running:
CLAUDE.md ← compressed (Claude reads this every session, fewer tokens)
CLAUDE.original.md ← human-readable backup (you read and edit this)
The README’s measured savings on real CLAUDE.md-style files:
| File | Original | Compressed | Saved |
|---|---|---|---|
claude-md-preferences.md |
706 | 285 | 59.6% |
project-notes.md |
1145 | 535 | 53.3% |
claude-md-project.md |
1122 | 636 | 43.3% |
todo-list.md |
627 | 388 | 38.1% |
| Average | 898 | 481 | 46% |
Code blocks, URLs, file paths, commands, headings, dates, and version numbers pass through untouched. Only prose gets compressed.
(Security note from the upstream: Snyk flags caveman-compress as High Risk due to subprocess/file patterns. It’s a false positive - the project’s SECURITY.md explains why.)
Other skills shipped in the same plugin
- caveman-commit - terse commit messages. Conventional Commits, ≤50 char subject, why-over-what.
- caveman-review - one-line PR comments.
L42: 🔴 bug: user null. Add guard.No throat-clearing. - caveman-help - quick-reference card; lists all modes, skills, commands.
When to reach for it
- Long agent sessions where output volume is a real cost driver.
- Codebases with bloated
CLAUDE.md/ context files where session-start tokens are silently hurting you. - Anyone who finds verbose AI explanations slower to read than to skim.
When not to
- Tasks where the prose itself is the deliverable (writing docs, drafting emails, customer-facing copy).
- Audiences who’ll find caveman speech unprofessional. The Lite intensity is the right starting point if you’re not sure.
Why this works (the boring research version)
There’s a March 2026 paper - “Brevity Constraints Reverse Performance Hierarchies in Language Models” - that found constraining large models to brief responses improved accuracy by 26 percentage points on certain benchmarks. Verbose isn’t always better. Sometimes fewer words means more correct.
Caveman is the practical, opinionated, fun-named expression of that result. The eval harness lives in evals/ if you want to verify the numbers yourself.
Similar tools
- Claude Code Analysis
82 docs and 15 diagrams mapping every major subsystem of Claude Code's accidentally exposed 512K-line TypeScript source - YOLO classifier, 93% context compaction, prompt-cache layout, 88+ feature flags, the custom React-Fiber terminal renderer.
- talk-normal
System prompt that forces any LLM to drop the corporate-overlord cadence and write like a normal person. Strips em-dashes, hedging, and 'in summary' filler.
- agents-md
Curated AGENTS.md preset that kills sycophancy, blocks drive-by refactors, and forces verification loops. Synthesizes Karpathy's principles with Cherny's Claude Code workflow.
- Claudraband
Wraps the real Claude Code TUI with a session lifecycle layer. Resumable non-interactive workflows, HTTP daemon for remote/headless control, ACP server for editor integrations (Zed, Toad). Drives your existing Claude Code install rather than reimplementing it - keeps skills, hooks, MCPs, and approvals intact.