343 lines
13 KiB
Markdown
343 lines
13 KiB
Markdown
|
|
# claude-gauge
|
||
|
|
|
||
|
|
Hardware instrument cluster plus companion web dashboard for Claude
|
||
|
|
Code session telemetry. Primary surface is a physical cluster on the
|
||
|
|
desk (fighter-jet / race-car aesthetic). Secondary surface is a
|
||
|
|
local web dashboard for the deep stats.
|
||
|
|
|
||
|
|
## Problem
|
||
|
|
|
||
|
|
There is no official programmatic API for Claude Max plan usage
|
||
|
|
today. The `/usage` and `/status` slash commands in Claude Code are
|
||
|
|
interactive-only. Open feature requests (claude-code issues #40395,
|
||
|
|
#13585, #27217, #33978) track this but nothing has shipped as of
|
||
|
|
2026-04-17.
|
||
|
|
|
||
|
|
Rather than wait, drive everything from local Claude Code state by
|
||
|
|
watching the session transcript JSONL files in
|
||
|
|
`~/.claude/projects/`. Swap the data source later when Anthropic
|
||
|
|
ships a real endpoint; the hardware, firmware, daemon API, and
|
||
|
|
dashboard all stay the same.
|
||
|
|
|
||
|
|
## Instrument cluster (primary surface)
|
||
|
|
|
||
|
|
Physical cluster on the desk. Three analog gauges across the top,
|
||
|
|
annunciator row below. Backlit. Brushed aluminium bezel, black face,
|
||
|
|
cream needles, burgundy redline zone (optional: match the
|
||
|
|
quartermaster palette for a house aesthetic).
|
||
|
|
|
||
|
|
```
|
||
|
|
+------------+ +--------------+ +------------+
|
||
|
|
| 5h FUEL | | TOKENS/MIN | | 7d FUEL |
|
||
|
|
| 0 - 100% | | 0 - redline | | 0 - 100% |
|
||
|
|
+------------+ +--------------+ +------------+
|
||
|
|
[OPUS] [SONNET] [HAIKU] [HOT] [WARN] [STALL] [IDLE]
|
||
|
|
```
|
||
|
|
|
||
|
|
### Primary gauges
|
||
|
|
|
||
|
|
| Gauge | Input | Feel |
|
||
|
|
|---|---|---|
|
||
|
|
| Center tach | tokens/min, short rolling window | Jumpy, fun, shows when Claude is cooking |
|
||
|
|
| Left fuel | % of 5h plan window used | Slow, steady, tells you when to worry |
|
||
|
|
| Right fuel | % of 7d plan window used | Slowest, long-arc view of the week |
|
||
|
|
|
||
|
|
### Annunciator row
|
||
|
|
|
||
|
|
| Lamp | Condition |
|
||
|
|
|---|---|
|
||
|
|
| OPUS / SONNET / HAIKU | lights the model that wrote the most recent tokens |
|
||
|
|
| HOT | flashes when tach crosses redline |
|
||
|
|
| WARN | solid when either fuel gauge is above 80% |
|
||
|
|
| STALL | lights after N minutes of JSONL silence (no activity) |
|
||
|
|
| IDLE | green "power on, daemon reachable" indicator |
|
||
|
|
|
||
|
|
Model lamps are colour-coded: Opus deep red, Sonnet amber, Haiku
|
||
|
|
green. Visual cue for why the tach just spiked.
|
||
|
|
|
||
|
|
### Optional fourth gauge
|
||
|
|
|
||
|
|
If physical real estate allows, a "temp" or "boost" sub-gauge:
|
||
|
|
|
||
|
|
* **Cache hit rate** as a boost gauge. High cache read = cheap
|
||
|
|
inference. Bragging-rights needle.
|
||
|
|
* **Thinking-to-output ratio** as a temp gauge. High = Claude is
|
||
|
|
grinding; low = cruising. Tells you when a task is hard.
|
||
|
|
|
||
|
|
## Companion web dashboard (secondary surface)
|
||
|
|
|
||
|
|
Same daemon serves a browser dashboard at `http://<host>:<port>/`
|
||
|
|
with the deep stats. Claude Code is already a terminal tool, so the
|
||
|
|
dashboard lives alongside as a geek-out surface when you want to
|
||
|
|
drill in past the three-needle summary.
|
||
|
|
|
||
|
|
### Dashboard sections
|
||
|
|
|
||
|
|
1. **Overview**: the three gauges rendered in the browser, plus the
|
||
|
|
annunciator row. Same data as the cluster, so a quick sanity
|
||
|
|
check from any device on the LAN.
|
||
|
|
2. **Rates and windows**: line charts for tokens/min over last 1h,
|
||
|
|
6h, 24h. 5h and 7d window sums over the past month.
|
||
|
|
3. **Cost**: dollar estimates derived from token counts and
|
||
|
|
published per-model pricing. Today, week-to-date, month-to-date,
|
||
|
|
projected month.
|
||
|
|
4. **Models**: stacked time series by model (Opus / Sonnet / Haiku).
|
||
|
|
Token split pie for the current 7d window.
|
||
|
|
5. **Projects**: tokens per project, time per project, last-active
|
||
|
|
timestamp. Sortable table.
|
||
|
|
6. **Tools**: top tools by call count, success rate per tool,
|
||
|
|
Bash command distribution (top commands by root executable).
|
||
|
|
7. **Files**: hottest files by Read + Edit count, edit-to-read
|
||
|
|
ratio per file.
|
||
|
|
8. **Rhythm**: time-of-day heatmap, day-of-week heatmap, session
|
||
|
|
duration distribution.
|
||
|
|
9. **Raw events**: streaming tail of the latest parsed JSONL rows,
|
||
|
|
for debugging and to watch the data land.
|
||
|
|
|
||
|
|
Dashboard is read-only. Nothing the user does here mutates state.
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
~/.claude/projects/**/*.jsonl
|
||
|
|
|
|
||
|
|
v (watchdog / inotify, line-append events)
|
||
|
|
|
|
||
|
|
[ gauge daemon ] <--- local, Python, long-running
|
||
|
|
|
|
||
|
|
+-- parse new line -> structured event
|
||
|
|
| { ts, session_id, project, model, role,
|
||
|
|
| input_tokens, output_tokens,
|
||
|
|
| cache_read, cache_creation,
|
||
|
|
| thinking_tokens, stop_reason,
|
||
|
|
| tool_use: [...], is_sidechain, request_id }
|
||
|
|
|
|
||
|
|
+-- in-memory ring buffer (last ~60 min of events)
|
||
|
|
+-- periodic flush of events + rolling aggregates to SQLite
|
||
|
|
|
|
||
|
|
+-- computed windows (primary, cluster):
|
||
|
|
| rate_instant last 30s, tokens/min
|
||
|
|
| rate_1m rolling 60s, tokens/min
|
||
|
|
| rate_5m rolling 5m, tokens/min
|
||
|
|
| window_5h rolling 5h sum
|
||
|
|
| window_7d rolling 7d sum
|
||
|
|
|
|
||
|
|
+-- computed aggregates (secondary, dashboard):
|
||
|
|
| cache hit rate, thinking ratio, cost estimate,
|
||
|
|
| per-model token split, per-project totals,
|
||
|
|
| tool call counts, file touch counts, rhythm grids
|
||
|
|
|
|
||
|
|
+-- derived flags:
|
||
|
|
| hot, warn, stall, idle, last_model, last_project
|
||
|
|
|
|
||
|
|
+-- HTTP endpoints:
|
||
|
|
| GET /usage primary cluster payload
|
||
|
|
| GET /stats deep aggregates for the dashboard
|
||
|
|
| GET /stream SSE of parsed events (raw tail)
|
||
|
|
| GET / dashboard UI
|
||
|
|
|
|
||
|
|
v (poll every ~1s)
|
||
|
|
[ ESP32 / Pi Pico, on LAN ]
|
||
|
|
|
|
||
|
|
+-- drives three needles via PWM + low-pass filter
|
||
|
|
(or servos if full-swing dials are easier)
|
||
|
|
+-- drives annunciator LEDs over GPIO
|
||
|
|
+-- optional small OLED behind the cluster for raw numbers
|
||
|
|
```
|
||
|
|
|
||
|
|
## Real-time honesty
|
||
|
|
|
||
|
|
Claude Code writes a JSONL line **after** each assistant message
|
||
|
|
completes, not during streaming. The cluster therefore updates
|
||
|
|
per-message, not per-token:
|
||
|
|
|
||
|
|
* Rapid back-and-forth with small messages: tach ticks steadily,
|
||
|
|
feels alive.
|
||
|
|
* Opus cranking a 40-second response: tach reads zero, zero, zero,
|
||
|
|
then SPIKES at the end with the full message's token count.
|
||
|
|
|
||
|
|
Still useful. The spike itself is informative ("that burn just hit
|
||
|
|
8k tokens in 40 seconds"). STALL lamp covers the "is it still
|
||
|
|
working or is the session idle" ambiguity.
|
||
|
|
|
||
|
|
If true stream-time is wanted later, a Claude Code hook
|
||
|
|
(`SessionStart` / `Stop` / `PostToolUse`) can push heartbeats into
|
||
|
|
the daemon. Dramatically more work for modest gain. Park as v2.
|
||
|
|
|
||
|
|
## Calibration
|
||
|
|
|
||
|
|
The daemon does not know Anthropic's real per-plan caps. User
|
||
|
|
configures a local ceiling:
|
||
|
|
|
||
|
|
```
|
||
|
|
CLAUDE_GAUGE_5H_CEILING=<tokens>
|
||
|
|
CLAUDE_GAUGE_7D_CEILING=<tokens>
|
||
|
|
CLAUDE_GAUGE_TACH_REDLINE=<tokens/min>
|
||
|
|
```
|
||
|
|
|
||
|
|
Gauges read against those ceilings. Estimate from experience: run
|
||
|
|
for a week at normal usage, note where `/usage` says you are vs
|
||
|
|
where the dials read, adjust the config. Ship sensible defaults
|
||
|
|
tunable per user.
|
||
|
|
|
||
|
|
## Data source migration path
|
||
|
|
|
||
|
|
When Anthropic ships an official usage endpoint, the daemon replaces
|
||
|
|
its input with that endpoint and drops the JSONL tailing. The HTTP
|
||
|
|
shape it serves to the cluster stays the same. The dashboard gains
|
||
|
|
accuracy but does not change structure. Zero hardware change, zero
|
||
|
|
firmware change.
|
||
|
|
|
||
|
|
## Stack (proposed, not decided)
|
||
|
|
|
||
|
|
**Daemon**: Python 3.12+, `uv`, `watchdog` for file tail, FastAPI
|
||
|
|
for HTTP, SQLite for durable state, Jinja2 for dashboard templates,
|
||
|
|
Alpine.js or HTMX for dashboard interactivity. Matches the house
|
||
|
|
style (see quartermaster).
|
||
|
|
|
||
|
|
**Firmware**: MicroPython on ESP32 or Pi Pico W for the wifi stack.
|
||
|
|
C/Rust if MicroPython HTTP polling turns out wobbly at the required
|
||
|
|
update rate.
|
||
|
|
|
||
|
|
**Hardware**: three analog voltmeter movements (0-5V or similar)
|
||
|
|
for the gauges, driven from PWM pins through low-pass filters, or
|
||
|
|
tiny servos with printed needles. Annunciator row is discrete LEDs
|
||
|
|
behind a smoked acrylic face.
|
||
|
|
|
||
|
|
## Metrics brainstorm (for later inspection)
|
||
|
|
|
||
|
|
Everything on this list is parseable from the JSONL today. The MVP
|
||
|
|
daemon only needs the primary-window stats; everything else lands
|
||
|
|
under later issues. Capturing them here so they don't get forgotten.
|
||
|
|
|
||
|
|
### Cost and tokens
|
||
|
|
|
||
|
|
* Total tokens by window (already primary)
|
||
|
|
* Breakdown: `input` / `output` / `cache_read` / `cache_creation`
|
||
|
|
* **Cache hit rate** = cache_read / (cache_read + cache_creation + input)
|
||
|
|
* **Cache savings** in dollars (cache reads are 10% of normal input cost)
|
||
|
|
* Cost per session at published per-model pricing
|
||
|
|
* Cost per day / week / month; projected month
|
||
|
|
* Opus / Sonnet / Haiku token split
|
||
|
|
* Server tool use: `web_search_requests`, `web_fetch_requests` per day
|
||
|
|
* Service tier distribution (standard vs priority)
|
||
|
|
* Ephemeral-1h vs ephemeral-5m cache split
|
||
|
|
|
||
|
|
### Time and rhythm
|
||
|
|
|
||
|
|
* Sessions per day / per week, rolling average
|
||
|
|
* Session duration distribution
|
||
|
|
* Time-of-day heatmap (circadian work pattern)
|
||
|
|
* Day-of-week heatmap
|
||
|
|
* Longest continuous session on record
|
||
|
|
* Your think-time: gap between assistant-end and next user message
|
||
|
|
* Claude's work-time: user message to assistant complete
|
||
|
|
* Active vs idle ratio within sessions
|
||
|
|
* Streak tracking (consecutive days used)
|
||
|
|
* All-nighter detector (session crossing 2am local)
|
||
|
|
|
||
|
|
### Work shape
|
||
|
|
|
||
|
|
* **Thinking tokens** counted from content blocks of type `thinking`
|
||
|
|
* Thinking-to-output ratio ("cogitation index")
|
||
|
|
* Stop-reason distribution (`end_turn` / `tool_use` / `max_tokens`);
|
||
|
|
watch for rising `max_tokens` (responses getting cut off)
|
||
|
|
* Messages per session
|
||
|
|
* Tool calls per assistant response (parallelism indicator)
|
||
|
|
* User interrupt rate (sessions ending on cancel)
|
||
|
|
* Iteration count per task (assistant messages between two
|
||
|
|
consecutive user prompts as a proxy)
|
||
|
|
|
||
|
|
### Tool usage
|
||
|
|
|
||
|
|
* Top tools by count (Bash, Edit, Read, Grep, ...)
|
||
|
|
* Tool success vs failure rate
|
||
|
|
* Bash command distribution, parsed by root executable (git, python,
|
||
|
|
uv, ls, ...)
|
||
|
|
* File reads / edits / writes per session
|
||
|
|
* Hottest files by touch count across all sessions
|
||
|
|
* Agent / subagent counts (`isSidechain=true`), subagent depth
|
||
|
|
* Web search / web fetch counts
|
||
|
|
|
||
|
|
### Project and context
|
||
|
|
|
||
|
|
* Tokens per project (JSONL path encodes the project)
|
||
|
|
* Time per project
|
||
|
|
* Project switching rate within a session
|
||
|
|
* Dormant project detector (no activity in N days)
|
||
|
|
* Languages touched, derived from file extensions
|
||
|
|
* Last file edited per project (resume-where-you-left-off)
|
||
|
|
|
||
|
|
### Friction and quality signals
|
||
|
|
|
||
|
|
* User message length distribution (one-word directives vs prose)
|
||
|
|
* Rough correction reflex count: user messages starting with "no",
|
||
|
|
"wrong", "stop", "actually"
|
||
|
|
* Permission denial frequency
|
||
|
|
* Retry / regenerate patterns
|
||
|
|
* File-history-snapshot count per session (checkpoint density)
|
||
|
|
|
||
|
|
### Character and fun
|
||
|
|
|
||
|
|
* Em dash violation count in assistant text (per the CLAUDE.md rule,
|
||
|
|
a needle that reads "rule-break events this week")
|
||
|
|
* Emoji leakage count
|
||
|
|
* Most-used phrase by Claude in your transcripts
|
||
|
|
* Most-used phrase by you (captures your actual directive vocabulary)
|
||
|
|
* "Dude, chill" detector: count of explicit pushback against Claude
|
||
|
|
* Thank-you rate per session
|
||
|
|
* Silent sessions (ended without `/compact` or `/clear`)
|
||
|
|
|
||
|
|
### Cross-system correlations (bigger, later)
|
||
|
|
|
||
|
|
* Git: commits produced per session, lines changed per token spent
|
||
|
|
* PRs / issues closed per session
|
||
|
|
* Quartermaster: correlate long Claude sessions with budget-editing
|
||
|
|
days (just for fun)
|
||
|
|
|
||
|
|
## Phasing
|
||
|
|
|
||
|
|
### Phase A — daemon MVP
|
||
|
|
|
||
|
|
One issue. Tail JSONL, parse into structured events, maintain the
|
||
|
|
five primary windows, print them to stdout every second. No HTTP,
|
||
|
|
no hardware, no dashboard. Prove the numbers land correctly against
|
||
|
|
Claude Code's own `/usage` output.
|
||
|
|
|
||
|
|
### Phase B — HTTP + dashboard overview
|
||
|
|
|
||
|
|
Stand up FastAPI, expose `GET /usage` for the cluster and `GET /`
|
||
|
|
for a dashboard overview page (three gauges rendered in SVG, same
|
||
|
|
data as the cluster). No deep stats yet. First thing you can load
|
||
|
|
in a browser.
|
||
|
|
|
||
|
|
### Phase C — firmware + single needle
|
||
|
|
|
||
|
|
Minimal ESP32 / Pico firmware that polls `/usage` and drives one
|
||
|
|
needle (the tach). Prove the hardware path end to end.
|
||
|
|
|
||
|
|
### Phase D — full cluster
|
||
|
|
|
||
|
|
Two more needles + annunciator row. Enclosure prototype.
|
||
|
|
|
||
|
|
### Phase E — dashboard deep stats
|
||
|
|
|
||
|
|
Cost panel, model panel, project panel, rhythm heatmaps, tool
|
||
|
|
panel, file panel, raw-event tail. Pulls from the aggregate store
|
||
|
|
the daemon has been building since Phase A.
|
||
|
|
|
||
|
|
### Phase F — character and cross-system
|
||
|
|
|
||
|
|
Em-dash detector, phrase extractor, git correlation, quartermaster
|
||
|
|
correlation. Lowest priority, highest amusement.
|
||
|
|
|
||
|
|
## Next steps
|
||
|
|
|
||
|
|
1. Forgejo repo at `archeious/claude-gauge` (created alongside
|
||
|
|
this plan).
|
||
|
|
2. File Phase A as the first issue.
|
||
|
|
3. Ship Phase A. See the five numbers tick up in a terminal while
|
||
|
|
typing into Claude Code. First dopamine hit.
|
||
|
|
4. File follow-up issues one phase at a time. No scope bleed
|
||
|
|
between phases.
|