claude-gauge/PLAN.md
Jeff Smith a683828e6b Initial scaffold
Project skeleton for claude-gauge: a hardware instrument cluster and
companion web dashboard driven by a local Python daemon that tails
~/.claude/projects/**/*.jsonl.

Includes:
- PLAN.md with cluster layout, dashboard sections, architecture,
  metrics brainstorm, and phasing.
- README.md overview.
- Python package scaffold (src/claude_gauge), pyproject.toml
  targeting Python 3.12+ and uv, MIT LICENSE, standard gitignore.

No runtime code yet. Phase A (daemon MVP) will land under the
first issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 19:01:30 -06:00

13 KiB

claude-gauge

Hardware instrument cluster plus companion web dashboard for Claude Code session telemetry. Primary surface is a physical cluster on the desk (fighter-jet / race-car aesthetic). Secondary surface is a local web dashboard for the deep stats.

Problem

There is no official programmatic API for Claude Max plan usage today. The /usage and /status slash commands in Claude Code are interactive-only. Open feature requests (claude-code issues #40395, #13585, #27217, #33978) track this but nothing has shipped as of 2026-04-17.

Rather than wait, drive everything from local Claude Code state by watching the session transcript JSONL files in ~/.claude/projects/. Swap the data source later when Anthropic ships a real endpoint; the hardware, firmware, daemon API, and dashboard all stay the same.

Instrument cluster (primary surface)

Physical cluster on the desk. Three analog gauges across the top, annunciator row below. Backlit. Brushed aluminium bezel, black face, cream needles, burgundy redline zone (optional: match the quartermaster palette for a house aesthetic).

    +------------+   +--------------+   +------------+
    |   5h FUEL  |   |   TOKENS/MIN |   |  7d FUEL   |
    |   0 - 100% |   |  0 - redline |   |  0 - 100%  |
    +------------+   +--------------+   +------------+
    [OPUS] [SONNET] [HAIKU]   [HOT] [WARN] [STALL] [IDLE]

Primary gauges

Gauge Input Feel
Center tach tokens/min, short rolling window Jumpy, fun, shows when Claude is cooking
Left fuel % of 5h plan window used Slow, steady, tells you when to worry
Right fuel % of 7d plan window used Slowest, long-arc view of the week

Annunciator row

Lamp Condition
OPUS / SONNET / HAIKU lights the model that wrote the most recent tokens
HOT flashes when tach crosses redline
WARN solid when either fuel gauge is above 80%
STALL lights after N minutes of JSONL silence (no activity)
IDLE green "power on, daemon reachable" indicator

Model lamps are colour-coded: Opus deep red, Sonnet amber, Haiku green. Visual cue for why the tach just spiked.

Optional fourth gauge

If physical real estate allows, a "temp" or "boost" sub-gauge:

  • Cache hit rate as a boost gauge. High cache read = cheap inference. Bragging-rights needle.
  • Thinking-to-output ratio as a temp gauge. High = Claude is grinding; low = cruising. Tells you when a task is hard.

Companion web dashboard (secondary surface)

Same daemon serves a browser dashboard at http://<host>:<port>/ with the deep stats. Claude Code is already a terminal tool, so the dashboard lives alongside as a geek-out surface when you want to drill in past the three-needle summary.

Dashboard sections

  1. Overview: the three gauges rendered in the browser, plus the annunciator row. Same data as the cluster, so a quick sanity check from any device on the LAN.
  2. Rates and windows: line charts for tokens/min over last 1h, 6h, 24h. 5h and 7d window sums over the past month.
  3. Cost: dollar estimates derived from token counts and published per-model pricing. Today, week-to-date, month-to-date, projected month.
  4. Models: stacked time series by model (Opus / Sonnet / Haiku). Token split pie for the current 7d window.
  5. Projects: tokens per project, time per project, last-active timestamp. Sortable table.
  6. Tools: top tools by call count, success rate per tool, Bash command distribution (top commands by root executable).
  7. Files: hottest files by Read + Edit count, edit-to-read ratio per file.
  8. Rhythm: time-of-day heatmap, day-of-week heatmap, session duration distribution.
  9. Raw events: streaming tail of the latest parsed JSONL rows, for debugging and to watch the data land.

Dashboard is read-only. Nothing the user does here mutates state.

Architecture

~/.claude/projects/**/*.jsonl
          |
          v (watchdog / inotify, line-append events)
          |
  [ gauge daemon ]  <--- local, Python, long-running
          |
          +-- parse new line -> structured event
          |     { ts, session_id, project, model, role,
          |       input_tokens, output_tokens,
          |       cache_read, cache_creation,
          |       thinking_tokens, stop_reason,
          |       tool_use: [...], is_sidechain, request_id }
          |
          +-- in-memory ring buffer (last ~60 min of events)
          +-- periodic flush of events + rolling aggregates to SQLite
          |
          +-- computed windows (primary, cluster):
          |     rate_instant    last 30s, tokens/min
          |     rate_1m         rolling 60s, tokens/min
          |     rate_5m         rolling 5m, tokens/min
          |     window_5h       rolling 5h sum
          |     window_7d       rolling 7d sum
          |
          +-- computed aggregates (secondary, dashboard):
          |     cache hit rate, thinking ratio, cost estimate,
          |     per-model token split, per-project totals,
          |     tool call counts, file touch counts, rhythm grids
          |
          +-- derived flags:
          |     hot, warn, stall, idle, last_model, last_project
          |
          +-- HTTP endpoints:
          |     GET  /usage        primary cluster payload
          |     GET  /stats        deep aggregates for the dashboard
          |     GET  /stream       SSE of parsed events (raw tail)
          |     GET  /             dashboard UI
          |
          v (poll every ~1s)
     [ ESP32 / Pi Pico, on LAN ]
          |
          +-- drives three needles via PWM + low-pass filter
              (or servos if full-swing dials are easier)
          +-- drives annunciator LEDs over GPIO
          +-- optional small OLED behind the cluster for raw numbers

Real-time honesty

Claude Code writes a JSONL line after each assistant message completes, not during streaming. The cluster therefore updates per-message, not per-token:

  • Rapid back-and-forth with small messages: tach ticks steadily, feels alive.
  • Opus cranking a 40-second response: tach reads zero, zero, zero, then SPIKES at the end with the full message's token count.

Still useful. The spike itself is informative ("that burn just hit 8k tokens in 40 seconds"). STALL lamp covers the "is it still working or is the session idle" ambiguity.

If true stream-time is wanted later, a Claude Code hook (SessionStart / Stop / PostToolUse) can push heartbeats into the daemon. Dramatically more work for modest gain. Park as v2.

Calibration

The daemon does not know Anthropic's real per-plan caps. User configures a local ceiling:

CLAUDE_GAUGE_5H_CEILING=<tokens>
CLAUDE_GAUGE_7D_CEILING=<tokens>
CLAUDE_GAUGE_TACH_REDLINE=<tokens/min>

Gauges read against those ceilings. Estimate from experience: run for a week at normal usage, note where /usage says you are vs where the dials read, adjust the config. Ship sensible defaults tunable per user.

Data source migration path

When Anthropic ships an official usage endpoint, the daemon replaces its input with that endpoint and drops the JSONL tailing. The HTTP shape it serves to the cluster stays the same. The dashboard gains accuracy but does not change structure. Zero hardware change, zero firmware change.

Stack (proposed, not decided)

Daemon: Python 3.12+, uv, watchdog for file tail, FastAPI for HTTP, SQLite for durable state, Jinja2 for dashboard templates, Alpine.js or HTMX for dashboard interactivity. Matches the house style (see quartermaster).

Firmware: MicroPython on ESP32 or Pi Pico W for the wifi stack. C/Rust if MicroPython HTTP polling turns out wobbly at the required update rate.

Hardware: three analog voltmeter movements (0-5V or similar) for the gauges, driven from PWM pins through low-pass filters, or tiny servos with printed needles. Annunciator row is discrete LEDs behind a smoked acrylic face.

Metrics brainstorm (for later inspection)

Everything on this list is parseable from the JSONL today. The MVP daemon only needs the primary-window stats; everything else lands under later issues. Capturing them here so they don't get forgotten.

Cost and tokens

  • Total tokens by window (already primary)
  • Breakdown: input / output / cache_read / cache_creation
  • Cache hit rate = cache_read / (cache_read + cache_creation + input)
  • Cache savings in dollars (cache reads are 10% of normal input cost)
  • Cost per session at published per-model pricing
  • Cost per day / week / month; projected month
  • Opus / Sonnet / Haiku token split
  • Server tool use: web_search_requests, web_fetch_requests per day
  • Service tier distribution (standard vs priority)
  • Ephemeral-1h vs ephemeral-5m cache split

Time and rhythm

  • Sessions per day / per week, rolling average
  • Session duration distribution
  • Time-of-day heatmap (circadian work pattern)
  • Day-of-week heatmap
  • Longest continuous session on record
  • Your think-time: gap between assistant-end and next user message
  • Claude's work-time: user message to assistant complete
  • Active vs idle ratio within sessions
  • Streak tracking (consecutive days used)
  • All-nighter detector (session crossing 2am local)

Work shape

  • Thinking tokens counted from content blocks of type thinking
  • Thinking-to-output ratio ("cogitation index")
  • Stop-reason distribution (end_turn / tool_use / max_tokens); watch for rising max_tokens (responses getting cut off)
  • Messages per session
  • Tool calls per assistant response (parallelism indicator)
  • User interrupt rate (sessions ending on cancel)
  • Iteration count per task (assistant messages between two consecutive user prompts as a proxy)

Tool usage

  • Top tools by count (Bash, Edit, Read, Grep, ...)
  • Tool success vs failure rate
  • Bash command distribution, parsed by root executable (git, python, uv, ls, ...)
  • File reads / edits / writes per session
  • Hottest files by touch count across all sessions
  • Agent / subagent counts (isSidechain=true), subagent depth
  • Web search / web fetch counts

Project and context

  • Tokens per project (JSONL path encodes the project)
  • Time per project
  • Project switching rate within a session
  • Dormant project detector (no activity in N days)
  • Languages touched, derived from file extensions
  • Last file edited per project (resume-where-you-left-off)

Friction and quality signals

  • User message length distribution (one-word directives vs prose)
  • Rough correction reflex count: user messages starting with "no", "wrong", "stop", "actually"
  • Permission denial frequency
  • Retry / regenerate patterns
  • File-history-snapshot count per session (checkpoint density)

Character and fun

  • Em dash violation count in assistant text (per the CLAUDE.md rule, a needle that reads "rule-break events this week")
  • Emoji leakage count
  • Most-used phrase by Claude in your transcripts
  • Most-used phrase by you (captures your actual directive vocabulary)
  • "Dude, chill" detector: count of explicit pushback against Claude
  • Thank-you rate per session
  • Silent sessions (ended without /compact or /clear)

Cross-system correlations (bigger, later)

  • Git: commits produced per session, lines changed per token spent
  • PRs / issues closed per session
  • Quartermaster: correlate long Claude sessions with budget-editing days (just for fun)

Phasing

Phase A — daemon MVP

One issue. Tail JSONL, parse into structured events, maintain the five primary windows, print them to stdout every second. No HTTP, no hardware, no dashboard. Prove the numbers land correctly against Claude Code's own /usage output.

Phase B — HTTP + dashboard overview

Stand up FastAPI, expose GET /usage for the cluster and GET / for a dashboard overview page (three gauges rendered in SVG, same data as the cluster). No deep stats yet. First thing you can load in a browser.

Phase C — firmware + single needle

Minimal ESP32 / Pico firmware that polls /usage and drives one needle (the tach). Prove the hardware path end to end.

Phase D — full cluster

Two more needles + annunciator row. Enclosure prototype.

Phase E — dashboard deep stats

Cost panel, model panel, project panel, rhythm heatmaps, tool panel, file panel, raw-event tail. Pulls from the aggregate store the daemon has been building since Phase A.

Phase F — character and cross-system

Em-dash detector, phrase extractor, git correlation, quartermaster correlation. Lowest priority, highest amusement.

Next steps

  1. Forgejo repo at archeious/claude-gauge (created alongside this plan).
  2. File Phase A as the first issue.
  3. Ship Phase A. See the five numbers tick up in a terminal while typing into Claude Code. First dopamine hit.
  4. File follow-up issues one phase at a time. No scope bleed between phases.