Codeburn

Interactive terminal dashboard that breaks down where your AI coding tokens actually go. Surfaces the chat-vs-tool-use split most users get wrong.

repo ↗

claude-codecodexcost-trackingobservabilitycli

If you’ve ever stared at an Anthropic invoice and wondered why your agent burned through a week’s budget refactoring a single React component, Codeburn is the diagnostic. It is a local TUI that tells you exactly where your AI coding tokens went, broken down by tool, model, project, and task type.

The thing the dashboard surfaces - and the thing that makes most users do a double-take - is the conversation-vs-coding split. One user’s published breakdown was 56% pure conversation and only 20% actual coding output. You don’t realise how much of an agent run is the model talking to itself until you measure it.

How it gathers data

No proxy, no wrapper, no API keys. Codeburn reads session transcripts directly off disk and prices each call against the LiteLLM rate sheet. The on-disk locations it knows about cover 16 tools, including:

Claude Code - ~/.claude/projects/
Claude Desktop - ~/Library/Application Support/Claude/local-agent-mode-sessions/
Codex - ~/.codex/sessions/
Cursor - SQLite at ~/Library/Application Support/Cursor/User/globalStorage/state.vscdb
Gemini CLI - ~/.gemini/tmp/<project>/chats/
GitHub Copilot - ~/.copilot/session-state/ plus VS Code workspace storage
OpenCode - SQLite at ~/.local/share/opencode/
Plus Roo Code, KiloCode, Droid, Qwen, Pi, OMP, OpenClaw, Kiro

Because everything is local, the privacy story is straightforward: Codeburn never sees the data the cloud doesn’t already have, and you can run it offline.

Quick start

npm install -g codeburn
npx codeburn

The default view is the last 7 days as an interactive dashboard. The single-purpose subcommands are useful for scripting:

codeburn today       # today only
codeburn month       # current month
codeburn report -p 30days
codeburn status      # summary line, e.g. for shell prompts
codeburn export      # CSV / JSON dump
codeburn optimize    # surfaces wasted spend
codeburn compare     # cost across models for the same workload
codeburn yield       # cost per shipped change

In the TUI, arrow keys rotate between Today / 7 / 30 / Month / All Time, 1–5 are shortcuts for those, c opens the model comparison, o opens the optimize report, p toggles providers, and q quits. The dashboard auto-refreshes every 30 seconds; tune it with --refresh.

What you actually get from it

Codeburn classifies every call into one of 13 categories - Coding, Debugging, Feature Dev, Refactoring, Testing, Exploration, Planning, Delegation, Git Ops, Build/Deploy, Brainstorming, Conversation, and General. Three of those carry most of the signal:

Conversation - back-and-forth that doesn’t change code. Anything north of ~30% is a yellow flag.
Exploration - the agent re-reading files. High exploration cost is a strong argument for plugging in something like a code-graph MCP.
One-shot rate - percent of edits that landed without a retry. Low one-shot rate means the model is guessing.

Per-project and per-model breakdowns are the part you’ll come back to. They’re the difference between “Claude Code is expensive” and “Claude Code is expensive on this one repo because the AGENTS.md is bad.”

When to reach for it

Before changing models or providers - you want a baseline so the comparison means something.
When monthly spend has crept up and you can’t point at a cause.
If you’re building tooling on top of agents and need to attribute cost to features.

When not to

For real-time enforcement. Codeburn is a read-only diagnostic - it doesn’t gate or rate-limit. Pair it with a proxy if you need policy enforcement.
For tools that don’t persist transcripts to disk. If you only use a web-based agent, there’s nothing for Codeburn to read.

Known accuracy caveats

A few of the tools obscure their numbers, and Codeburn estimates rather than guesses:

Cursor - when “Auto” mode hides the actual model, costs are estimated at Sonnet rates. First run on a large Cursor SQLite database can take up to a minute.
GitHub Copilot in VS Code - no explicit token counts in the format; estimated from content length.
Kiro - same story, costed at Sonnet rates.

Treat the absolute numbers as ±10–20% on those tools. The relative breakdown - which projects, which task types, which days - is still the part that matters and is unaffected.

Requires Node.js 20+ and at least one tool that writes session data to disk.