Codeburn
Interactive terminal dashboard that breaks down where your AI coding tokens actually go. Surfaces the chat-vs-tool-use split most users get wrong.
If you’ve ever stared at an Anthropic invoice and wondered why your agent burned through a week’s budget refactoring a single React component, Codeburn is the diagnostic. It is a local TUI that tells you exactly where your AI coding tokens went, broken down by tool, model, project, and task type.
The thing the dashboard surfaces - and the thing that makes most users do a double-take - is the conversation-vs-coding split. One user’s published breakdown was 56% pure conversation and only 20% actual coding output. You don’t realise how much of an agent run is the model talking to itself until you measure it.
How it gathers data
No proxy, no wrapper, no API keys. Codeburn reads session transcripts directly off disk and prices each call against the LiteLLM rate sheet. The on-disk locations it knows about cover 16 tools, including:
- Claude Code -
~/.claude/projects/ - Claude Desktop -
~/Library/Application Support/Claude/local-agent-mode-sessions/ - Codex -
~/.codex/sessions/ - Cursor - SQLite at
~/Library/Application Support/Cursor/User/globalStorage/state.vscdb - Gemini CLI -
~/.gemini/tmp/<project>/chats/ - GitHub Copilot -
~/.copilot/session-state/plus VS Code workspace storage - OpenCode - SQLite at
~/.local/share/opencode/ - Plus Roo Code, KiloCode, Droid, Qwen, Pi, OMP, OpenClaw, Kiro
Because everything is local, the privacy story is straightforward: Codeburn never sees the data the cloud doesn’t already have, and you can run it offline.
Quick start
npm install -g codeburn
npx codeburn
The default view is the last 7 days as an interactive dashboard. The single-purpose subcommands are useful for scripting:
codeburn today # today only
codeburn month # current month
codeburn report -p 30days
codeburn status # summary line, e.g. for shell prompts
codeburn export # CSV / JSON dump
codeburn optimize # surfaces wasted spend
codeburn compare # cost across models for the same workload
codeburn yield # cost per shipped change
In the TUI, arrow keys rotate between Today / 7 / 30 / Month / All Time, 1–5 are shortcuts for those, c opens the model comparison, o opens the optimize report, p toggles providers, and q quits. The dashboard auto-refreshes every 30 seconds; tune it with --refresh.
What you actually get from it
Codeburn classifies every call into one of 13 categories - Coding, Debugging, Feature Dev, Refactoring, Testing, Exploration, Planning, Delegation, Git Ops, Build/Deploy, Brainstorming, Conversation, and General. Three of those carry most of the signal:
- Conversation - back-and-forth that doesn’t change code. Anything north of ~30% is a yellow flag.
- Exploration - the agent re-reading files. High exploration cost is a strong argument for plugging in something like a code-graph MCP.
- One-shot rate - percent of edits that landed without a retry. Low one-shot rate means the model is guessing.
Per-project and per-model breakdowns are the part you’ll come back to. They’re the difference between “Claude Code is expensive” and “Claude Code is expensive on this one repo because the AGENTS.md is bad.”
When to reach for it
- Before changing models or providers - you want a baseline so the comparison means something.
- When monthly spend has crept up and you can’t point at a cause.
- If you’re building tooling on top of agents and need to attribute cost to features.
When not to
- For real-time enforcement. Codeburn is a read-only diagnostic - it doesn’t gate or rate-limit. Pair it with a proxy if you need policy enforcement.
- For tools that don’t persist transcripts to disk. If you only use a web-based agent, there’s nothing for Codeburn to read.
Known accuracy caveats
A few of the tools obscure their numbers, and Codeburn estimates rather than guesses:
- Cursor - when “Auto” mode hides the actual model, costs are estimated at Sonnet rates. First run on a large Cursor SQLite database can take up to a minute.
- GitHub Copilot in VS Code - no explicit token counts in the format; estimated from content length.
- Kiro - same story, costed at Sonnet rates.
Treat the absolute numbers as ±10–20% on those tools. The relative breakdown - which projects, which task types, which days - is still the part that matters and is unaffected.
Requires Node.js 20+ and at least one tool that writes session data to disk.
Similar tools
- wanman
Multi-agent runtime that spawns each Claude Code or Codex agent in its own git worktree and home directory. JSON-RPC subprocess control, task pooling, artifact storage. Solves the share-a-directory failure mode that breaks most multi-agent harnesses.
- prism
Python TUI that finds where extra tokens are burned in Claude Code sessions, why sessions fail, and what to fix. Built on Textual, focused on debugging your own usage.
- abtop
Rust TUI that monitors Claude Code and Codex sessions: token spend, context window, rate limits, and ports in real time. Like btop but for agent runtimes.
- Claudraband
Wraps the real Claude Code TUI with a session lifecycle layer. Resumable non-interactive workflows, HTTP daemon for remote/headless control, ACP server for editor integrations (Zed, Toad). Drives your existing Claude Code install rather than reimplementing it - keeps skills, hooks, MCPs, and approvals intact.