diff --git a/PLAN.md b/PLAN.md index 633695d..9e21da0 100644 --- a/PLAN.md +++ b/PLAN.md @@ -1,30 +1,31 @@ # claude-gauge -Hardware instrument cluster plus companion web dashboard for Claude -Code session telemetry. Primary surface is a physical cluster on the -desk (fighter-jet / race-car aesthetic). Secondary surface is a -local web dashboard for the deep stats. +Hardware instrument cluster displaying Claude Code session telemetry. +Three analog needle gauges plus an annunciator row, driven by an ESP32 +polling a local daemon, driven by Claude Code's own OpenTelemetry feed +or by `ccusage`. Fighter-jet / race-car aesthetic. Physical-first. -## Problem +## Why -There is no official programmatic API for Claude Max plan usage -today. The `/usage` and `/status` slash commands in Claude Code are -interactive-only. Open feature requests (claude-code issues #40395, -#13585, #27217, #33978) track this but nothing has shipped as of -2026-04-17. +Watching tokens burn against the Max-plan windows is useful, but the +same data also tells you when Claude is grinding, which model just +ran, and how warm your cache is. A dial on the desk makes that +ambient instead of tab-switching. -Rather than wait, drive everything from local Claude Code state by -watching the session transcript JSONL files in -`~/.claude/projects/`. Swap the data source later when Anthropic -ships a real endpoint; the hardware, firmware, daemon API, and -dashboard all stay the same. +## Prior art and the decision it implies -## Instrument cluster (primary surface) +Software side is crowded. `ccusage`, Claude-Code-Usage-Monitor, +haasonsaas/claude-usage-tracker, phuryn/claude-usage, multiple +Grafana dashboards (Grafana Labs 25052 and 24993), and Anthropic's +own `claude-code-monitoring-guide` repo all do the JSONL parsing and +rolling-window math already. Claude Code ships with native +OpenTelemetry support. The physical-gauge angle has no extant +prior art. -Physical cluster on the desk. Three analog gauges across the top, -annunciator row below. Backlit. Brushed aluminium bezel, black face, -cream needles, burgundy redline zone (optional: match the -quartermaster palette for a house aesthetic). +Implication: do not rebuild the telemetry layer. Consume it. Spend +the love on the hardware and the adapter that bridges it. + +## Instrument cluster (same in all architectures) ``` +------------+ +--------------+ +------------+ @@ -34,309 +35,724 @@ quartermaster palette for a house aesthetic). [OPUS] [SONNET] [HAIKU] [HOT] [WARN] [STALL] [IDLE] ``` -### Primary gauges - -| Gauge | Input | Feel | -|---|---|---| -| Center tach | tokens/min, short rolling window | Jumpy, fun, shows when Claude is cooking | -| Left fuel | % of 5h plan window used | Slow, steady, tells you when to worry | -| Right fuel | % of 7d plan window used | Slowest, long-arc view of the week | - -### Annunciator row +| Gauge | Metric | +|---|---| +| Center tach | Tokens/min, rolling short window | +| Left fuel | % of 5h plan window used | +| Right fuel | % of 7d plan window used | | Lamp | Condition | |---|---| -| OPUS / SONNET / HAIKU | lights the model that wrote the most recent tokens | -| HOT | flashes when tach crosses redline | -| WARN | solid when either fuel gauge is above 80% | -| STALL | lights after N minutes of JSONL silence (no activity) | -| IDLE | green "power on, daemon reachable" indicator | +| OPUS / SONNET / HAIKU | colour-coded model that emitted the most recent tokens | +| HOT | tach above redline | +| WARN | either fuel gauge above 80% | +| STALL | no telemetry in last N minutes | +| IDLE | daemon reachable, no activity | -Model lamps are colour-coded: Opus deep red, Sonnet amber, Haiku -green. Visual cue for why the tach just spiked. +## Two architectures -### Optional fourth gauge +Pick one. Both feed the same firmware and cluster. -If physical real estate allows, a "temp" or "boost" sub-gauge: +| | A. OTEL-native | B. ccusage-sourced | +|---|---|---| +| Data source | Claude Code OTLP -> collector -> Prometheus | Local JSONL via `ccusage` CLI | +| External deps | Docker Compose stack (collector, Prometheus, Grafana) | Node + `npx ccusage` | +| Deep-stats dashboard | Grafana dashboard 25052 for free | Build nothing, ccusage has a TUI | +| Short-window tach | Limited by Prometheus scrape interval (15s) | Hybrid JSONL tail gives sub-second | +| Operational weight | Moderate (3 services) | Tiny (one subprocess) | +| Homelab-enterprise fit | Strong | Weak | +| Time to first needle | Day 2 | Day 1 | +| Survivability through Claude Code updates | High (OTEL schema is stable and documented) | Medium (JSONL layout is an implementation detail) | -* **Cache hit rate** as a boost gauge. High cache read = cheap - inference. Bragging-rights needle. -* **Thinking-to-output ratio** as a temp gauge. High = Claude is - grinding; low = cruising. Tells you when a task is hard. +Both share the firmware, cluster, and enclosure. The daemon is the +only thing that differs. The `/usage` HTTP shape is identical across +A and B so the firmware never knows which backend is wired up. -## Companion web dashboard (secondary surface) +--- -Same daemon serves a browser dashboard at `http://:/` -with the deep stats. Claude Code is already a terminal tool, so the -dashboard lives alongside as a geek-out surface when you want to -drill in past the three-needle summary. +# Architecture A: OTEL-native -### Dashboard sections +## Stack -1. **Overview**: the three gauges rendered in the browser, plus the - annunciator row. Same data as the cluster, so a quick sanity - check from any device on the LAN. -2. **Rates and windows**: line charts for tokens/min over last 1h, - 6h, 24h. 5h and 7d window sums over the past month. -3. **Cost**: dollar estimates derived from token counts and - published per-model pricing. Today, week-to-date, month-to-date, - projected month. -4. **Models**: stacked time series by model (Opus / Sonnet / Haiku). - Token split pie for the current 7d window. -5. **Projects**: tokens per project, time per project, last-active - timestamp. Sortable table. -6. **Tools**: top tools by call count, success rate per tool, - Bash command distribution (top commands by root executable). -7. **Files**: hottest files by Read + Edit count, edit-to-read - ratio per file. -8. **Rhythm**: time-of-day heatmap, day-of-week heatmap, session - duration distribution. -9. **Raw events**: streaming tail of the latest parsed JSONL rows, - for debugging and to watch the data land. +Mirror Anthropic's reference stack (`claude-code-monitoring-guide`). +Three containers. -Dashboard is read-only. Nothing the user does here mutates state. +```yaml +# docker-compose.yml +services: + otel-collector: + image: otel/opentelemetry-collector-contrib:latest + command: ["--config=/etc/otel-collector-config.yaml"] + volumes: + - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml + ports: + - "4317:4317" # OTLP gRPC + - "4318:4318" # OTLP HTTP + - "8889:8889" # Prometheus scrape + depends_on: + - prometheus -## Architecture + prometheus: + image: prom/prometheus:latest + ports: + - "9090:9090" + volumes: + - ./prometheus.yml:/etc/prometheus/prometheus.yml + - prometheus_data:/prometheus + command: + - '--config.file=/etc/prometheus/prometheus.yml' + - '--storage.tsdb.path=/prometheus' + - '--storage.tsdb.retention.time=8d' # > 7d so increase() works + - '--web.enable-lifecycle' -``` -~/.claude/projects/**/*.jsonl - | - v (watchdog / inotify, line-append events) - | - [ gauge daemon ] <--- local, Python, long-running - | - +-- parse new line -> structured event - | { ts, session_id, project, model, role, - | input_tokens, output_tokens, - | cache_read, cache_creation, - | thinking_tokens, stop_reason, - | tool_use: [...], is_sidechain, request_id } - | - +-- in-memory ring buffer (last ~60 min of events) - +-- periodic flush of events + rolling aggregates to SQLite - | - +-- computed windows (primary, cluster): - | rate_instant last 30s, tokens/min - | rate_1m rolling 60s, tokens/min - | rate_5m rolling 5m, tokens/min - | window_5h rolling 5h sum - | window_7d rolling 7d sum - | - +-- computed aggregates (secondary, dashboard): - | cache hit rate, thinking ratio, cost estimate, - | per-model token split, per-project totals, - | tool call counts, file touch counts, rhythm grids - | - +-- derived flags: - | hot, warn, stall, idle, last_model, last_project - | - +-- HTTP endpoints: - | GET /usage primary cluster payload - | GET /stats deep aggregates for the dashboard - | GET /stream SSE of parsed events (raw tail) - | GET / dashboard UI - | - v (poll every ~1s) - [ ESP32 / Pi Pico, on LAN ] - | - +-- drives three needles via PWM + low-pass filter - (or servos if full-swing dials are easier) - +-- drives annunciator LEDs over GPIO - +-- optional small OLED behind the cluster for raw numbers + grafana: + image: grafana/grafana:latest + ports: + - "3000:3000" + environment: + - GF_SECURITY_ADMIN_PASSWORD=admin + volumes: + - grafana_data:/var/lib/grafana + - ./grafana/provisioning:/etc/grafana/provisioning + - ./grafana/dashboards:/var/lib/grafana/dashboards + depends_on: + - prometheus + +volumes: + prometheus_data: + grafana_data: ``` -## Real-time honesty +```yaml +# otel-collector-config.yaml +receivers: + otlp: + protocols: + grpc: + endpoint: 0.0.0.0:4317 + http: + endpoint: 0.0.0.0:4318 -Claude Code writes a JSONL line **after** each assistant message -completes, not during streaming. The cluster therefore updates -per-message, not per-token: +processors: + batch: + timeout: 1s + send_batch_size: 1024 + memory_limiter: + check_interval: 1s + limit_mib: 512 -* Rapid back-and-forth with small messages: tach ticks steadily, - feels alive. -* Opus cranking a 40-second response: tach reads zero, zero, zero, - then SPIKES at the end with the full message's token count. +exporters: + prometheus: + endpoint: "0.0.0.0:8889" + send_timestamps: true + metric_expiration: 192h # 8 days, covers 7d window + enable_open_metrics: true -Still useful. The spike itself is informative ("that burn just hit -8k tokens in 40 seconds"). STALL lamp covers the "is it still -working or is the session idle" ambiguity. - -If true stream-time is wanted later, a Claude Code hook -(`SessionStart` / `Stop` / `PostToolUse`) can push heartbeats into -the daemon. Dramatically more work for modest gain. Park as v2. - -## Calibration - -The daemon does not know Anthropic's real per-plan caps. User -configures a local ceiling: - -``` -CLAUDE_GAUGE_5H_CEILING= -CLAUDE_GAUGE_7D_CEILING= -CLAUDE_GAUGE_TACH_REDLINE= +service: + pipelines: + metrics: + receivers: [otlp] + processors: [memory_limiter, batch] + exporters: [prometheus] ``` -Gauges read against those ceilings. Estimate from experience: run -for a week at normal usage, note where `/usage` says you are vs -where the dials read, adjust the config. Ship sensible defaults -tunable per user. +```yaml +# prometheus.yml +global: + scrape_interval: 15s + evaluation_interval: 15s -## Data source migration path +scrape_configs: + - job_name: 'otel-collector' + static_configs: + - targets: ['otel-collector:8889'] +``` -When Anthropic ships an official usage endpoint, the daemon replaces -its input with that endpoint and drops the JSONL tailing. The HTTP -shape it serves to the cluster stays the same. The dashboard gains -accuracy but does not change structure. Zero hardware change, zero -firmware change. +Import Grafana Labs dashboard **25052** ("Claude Code") against the +Prometheus data source. That is the deep-stats dashboard; no custom +web UI needed. -## Stack (proposed, not decided) +## Claude Code configuration -**Daemon**: Python 3.12+, `uv`, `watchdog` for file tail, FastAPI -for HTTP, SQLite for durable state, Jinja2 for dashboard templates, -Alpine.js or HTMX for dashboard interactivity. Matches the house -style (see quartermaster). +Set in the shell Claude Code runs in (user profile, systemd unit, +or `~/.claude/settings.json` managed settings): -**Firmware**: MicroPython on ESP32 or Pi Pico W for the wifi stack. -C/Rust if MicroPython HTTP polling turns out wobbly at the required -update rate. +```bash +export CLAUDE_CODE_ENABLE_TELEMETRY=1 +export OTEL_METRICS_EXPORTER=otlp +export OTEL_LOGS_EXPORTER=otlp +export OTEL_EXPORTER_OTLP_PROTOCOL=grpc +export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 +export OTEL_METRIC_EXPORT_INTERVAL=10000 # 10s for gauge responsiveness +export OTEL_METRICS_INCLUDE_SESSION_ID=false # bound cardinality +``` -**Hardware**: three analog voltmeter movements (0-5V or similar) -for the gauges, driven from PWM pins through low-pass filters, or -tiny servos with printed needles. Annunciator row is discrete LEDs -behind a smoked acrylic face. +## Metrics Claude Code emits (via OTEL, surfaced in Prometheus) -## Metrics brainstorm (for later inspection) +All prefixed `claude_code_` after the OTEL-to-Prom conversion. -Everything on this list is parseable from the JSONL today. The MVP -daemon only needs the primary-window stats; everything else lands -under later issues. Capturing them here so they don't get forgotten. +| Prometheus metric | Labels | +|---|---| +| `claude_code_token_usage_tokens_total` | `type` (`input`/`output`/`cacheRead`/`cacheCreation`), `model` | +| `claude_code_cost_usage_USD_total` | `model` | +| `claude_code_session_count_total` | | +| `claude_code_active_time_total_seconds_total` | `type` (`user`/`cli`) | +| `claude_code_lines_of_code_count_total` | `type` (`added`/`removed`) | +| `claude_code_commit_count_total` | | +| `claude_code_pull_request_count_total` | | +| `claude_code_code_edit_tool_decision_count_total` | `tool_name`, `decision`, `language` | + +Events (via OTEL logs) carry richer per-request context including +`prompt.id`, `duration_ms`, `speed` (fast/normal), etc. Not needed +for the primary gauges. + +## PromQL the daemon runs + +```promql +# Tokens/min, short rolling window (tach) +sum(rate(claude_code_token_usage_tokens_total[1m])) * 60 + +# 5h window sum (left fuel) +sum(increase(claude_code_token_usage_tokens_total[5h])) + +# 7d window sum (right fuel) +sum(increase(claude_code_token_usage_tokens_total[7d])) + +# Cache hit rate (optional sub-gauge) + sum(rate(claude_code_token_usage_tokens_total{type="cacheRead"}[5m])) +/ sum(rate(claude_code_token_usage_tokens_total{type=~"input|cacheRead|cacheCreation"}[5m])) + +# Last model (approximation via max-sample lookup) +topk(1, claude_code_token_usage_tokens_total{type="output"}) + +# Cost estimates +sum(increase(claude_code_cost_usage_USD_total[5h])) +sum(increase(claude_code_cost_usage_USD_total[7d])) + +# Stall detection (no tokens in last N minutes) +absent(rate(claude_code_token_usage_tokens_total[2m]) > 0) +``` + +## Daemon (A) + +Thin Python service. Queries Prometheus, transforms to `/usage` +payload for the firmware. + +``` +src/claude_gauge/ + __init__.py + daemon_prom.py FastAPI app, PromQL queries, /usage endpoint + config.py Prometheus URL, ceilings, stall threshold + windows.py PromQL builders and result parsing + calibration.py Maps raw values to firmware-friendly 0-1000 scales +``` + +```python +# daemon_prom.py sketch +import os +import httpx +from fastapi import FastAPI + +PROM = os.environ.get("CLAUDE_GAUGE_PROM_URL", "http://localhost:9090") +CEIL_5H = int(os.environ.get("CLAUDE_GAUGE_5H_CEILING", 500_000)) +CEIL_7D = int(os.environ.get("CLAUDE_GAUGE_7D_CEILING", 3_000_000)) +RED = int(os.environ.get("CLAUDE_GAUGE_TACH_REDLINE", 8000)) # tokens/min + +app = FastAPI() +client = httpx.AsyncClient(timeout=5.0) + +async def prom(q: str) -> float: + r = await client.get(f"{PROM}/api/v1/query", params={"query": q}) + data = r.json()["data"]["result"] + return float(data[0]["value"][1]) if data else 0.0 + +@app.get("/usage") +async def usage(): + rate_1m = await prom("sum(rate(claude_code_token_usage_tokens_total[1m])) * 60") + win_5h = await prom("sum(increase(claude_code_token_usage_tokens_total[5h]))") + win_7d = await prom("sum(increase(claude_code_token_usage_tokens_total[7d]))") + cache = await prom( + 'sum(rate(claude_code_token_usage_tokens_total{type="cacheRead"}[5m])) / ' + 'sum(rate(claude_code_token_usage_tokens_total{type=~"input|cacheRead|cacheCreation"}[5m]))' + ) + stalled = (await prom( + 'sum(rate(claude_code_token_usage_tokens_total[2m]))' + )) == 0.0 + return { + "rate_1m": rate_1m, + "window_5h_tokens": win_5h, + "window_5h_pct": min(1.0, win_5h / CEIL_5H), + "window_7d_tokens": win_7d, + "window_7d_pct": min(1.0, win_7d / CEIL_7D), + "cache_hit_rate": cache, + "hot": rate_1m > RED, + "warn": (win_5h / CEIL_5H) > 0.8 or (win_7d / CEIL_7D) > 0.8, + "stall": stalled, + "idle": True, + "last_model": await last_model(), + } +``` + +`last_model` needs one extra query that picks the `model` label of +the most recently incremented output-token series. Implementation +detail; simplest is to run a small query loop on metric labels. + +## Dependencies (A) + +```toml +# pyproject.toml additions +dependencies = [ + "fastapi>=0.136.0", + "uvicorn[standard]>=0.44.0", + "httpx>=0.28.1", +] +``` + +No SQLite, no watchdog, no ORM. Prometheus is the database. + +## Retention considerations + +* Collector `metric_expiration: 192h` keeps a metric visible for 8d + after its last sample, so 7d `increase()` queries work even on + intermittent sessions. +* Prometheus `--storage.tsdb.retention.time=8d` keeps the samples + long enough for the same 7d queries. +* Grafana dashboard 25052 pulls from the same Prometheus. + +## Pros and cons of A + +Pros: +* Uses the platform feature Anthropic ships. +* Grafana dashboard is free. +* Metric schema is documented and stable. +* Plays cleanly with any other homelab metrics already in Prometheus. +* Architecture translates without changes when other machines run + Claude Code too: point their OTLP endpoint at the same collector. + +Cons: +* Prometheus scrape interval caps tach responsiveness at ~15s. +* Three containers to run. +* Requires env-var changes on every Claude Code launch surface. + +## Tach responsiveness mitigation (A) + +If the 15s cap bothers you, the daemon can keep a tiny JSONL-tail +fallback just for the tach. Same code shape as architecture B's tach +component; described below. Pulling the fuel gauges and everything +else from Prometheus, tach from direct file tail, is a clean hybrid. +Only activate if Phase C shows the needle feels sluggish. + +--- + +# Architecture B: ccusage-sourced + +## Stack + +One process: `ccusage` as a long-lived subprocess or periodic shell +call. No collector, no Prometheus, no Grafana. A hybrid watchdog +tail handles the sub-second tach that ccusage's aggregate API can't. + +``` +[ Claude Code ] -> ~/.claude/projects/**/*.jsonl + | + +---+----------------+ + | | + v v + [ watchdog tail ] [ ccusage CLI / MCP ] + (short-window tach) (5h blocks, 7d daily) + | | + +----------+---------+ + v + [ claude-gauge daemon ] + GET /usage + | + v + ESP32 firmware +``` + +## ccusage integration options + +Two shapes work. Pick one, not both. + +### Option B1: periodic CLI subprocess (simplest) + +```bash +npx ccusage@latest blocks --json # current 5h block +npx ccusage@latest daily --json # per-day aggregates for 7d sum +``` + +Run every ~10s from the daemon. Parse JSON, fill the fuel gauges. + +### Option B2: ccusage MCP HTTP server (persistent) + +```bash +bunx @ccusage/mcp@latest --type http --port 8080 +``` + +Exposes a Hono app at `POST /` handling MCP StreamableHTTP +requests. Four registered tools: + +| Tool | Description | +|---|---| +| `daily` | Usage grouped by date | +| `monthly` | Usage grouped by month | +| `session` | Usage grouped by conversation session | +| `blocks` | Usage grouped by 5-hour session billing blocks | + +Each tool accepts `since`, `until`, `mode`, `timezone`, `locale` and +returns JSON in an MCP text content block. + +Invoke as an MCP client from the daemon (`mcp` Python SDK) or as +raw JSON-RPC to `POST /`. + +### Recommendation + +**B1**. The CLI path is simpler, has fewer moving parts, and the +performance hit of a subprocess call every 10s is negligible. +Switch to B2 only if you also want the MCP surface exposed to other +local agents (Claude Code can already consume ccusage's MCP). + +## Short-window tach via watchdog + +ccusage aggregates are too coarse for the tach. The daemon keeps its +own 60-second ring buffer by tailing JSONL directly. + +```python +from watchdog.observers import Observer +from watchdog.events import FileSystemEventHandler +from collections import deque +from pathlib import Path +import json, time + +class JsonlTail(FileSystemEventHandler): + def __init__(self, bus): + self.bus = bus + self.offsets: dict[Path, int] = {} + + def on_modified(self, event): + p = Path(event.src_path) + if p.suffix != ".jsonl": + return + off = self.offsets.get(p, 0) + with p.open() as f: + f.seek(off) + for line in f: + try: + d = json.loads(line) + except json.JSONDecodeError: + continue + if d.get("type") == "assistant": + u = d.get("message", {}).get("usage", {}) + tokens = sum(u.get(k, 0) for k in ( + "input_tokens", "output_tokens", + "cache_read_input_tokens", + "cache_creation_input_tokens", + )) + model = d.get("message", {}).get("model", "") + self.bus.push(time.time(), tokens, model) + self.offsets[p] = f.tell() + +class RateBus: + def __init__(self, window_s=60): + self.window_s = window_s + self.buf: deque[tuple[float, int, str]] = deque() + + def push(self, ts, tokens, model): + self.buf.append((ts, tokens, model)) + self._evict() + + def _evict(self): + cutoff = time.time() - self.window_s + while self.buf and self.buf[0][0] < cutoff: + self.buf.popleft() + + def rate_per_min(self): + self._evict() + return sum(t for _, t, _ in self.buf) + + def last_model(self): + return self.buf[-1][2] if self.buf else None +``` + +## Daemon (B) + +``` +src/claude_gauge/ + __init__.py + daemon_ccusage.py FastAPI app, ccusage subprocess calls, /usage + tail.py watchdog + RateBus for tach + config.py + calibration.py +``` + +```python +# daemon_ccusage.py sketch +import asyncio, json, os, subprocess +from fastapi import FastAPI +from .tail import RateBus, start_watcher + +CEIL_5H = int(os.environ.get("CLAUDE_GAUGE_5H_CEILING", 500_000)) +CEIL_7D = int(os.environ.get("CLAUDE_GAUGE_7D_CEILING", 3_000_000)) +RED = int(os.environ.get("CLAUDE_GAUGE_TACH_REDLINE", 8000)) + +bus = RateBus(window_s=60) +start_watcher(bus) # background thread + +app = FastAPI() + +async def ccusage(cmd: str) -> dict: + proc = await asyncio.create_subprocess_exec( + "npx", "ccusage@latest", cmd, "--json", + stdout=asyncio.subprocess.PIPE, + ) + out, _ = await proc.communicate() + return json.loads(out) + +async def current_5h_tokens() -> int: + blocks = await ccusage("blocks") + cur = next((b for b in blocks.get("blocks", []) if b.get("isActive")), None) + return cur["totalTokens"] if cur else 0 + +async def trailing_7d_tokens() -> int: + daily = await ccusage("daily") + # sum last 7 daily buckets + rows = daily.get("daily", [])[-7:] + return sum(r["totalTokens"] for r in rows) + +@app.get("/usage") +async def usage(): + rate = bus.rate_per_min() + w5, w7 = await asyncio.gather(current_5h_tokens(), trailing_7d_tokens()) + return { + "rate_1m": rate, + "window_5h_tokens": w5, + "window_5h_pct": min(1.0, w5 / CEIL_5H), + "window_7d_tokens": w7, + "window_7d_pct": min(1.0, w7 / CEIL_7D), + "hot": rate > RED, + "warn": (w5 / CEIL_5H) > 0.8 or (w7 / CEIL_7D) > 0.8, + "stall": rate == 0 and not bus.buf, + "idle": True, + "last_model": bus.last_model(), + } +``` + +Cache `ccusage blocks/daily` output with a 10s TTL so the `/usage` +endpoint stays cheap when the firmware polls at 1 Hz. + +## Dependencies (B) + +```toml +dependencies = [ + "fastapi>=0.136.0", + "uvicorn[standard]>=0.44.0", + "watchdog>=5.0.0", +] +``` + +Node needs to be on the PATH for `npx ccusage@latest`. Pin a version +in config rather than using `@latest` once the daemon is past Phase A. + +## Pros and cons of B + +Pros: +* Single process, one dependency tree. +* Sub-second tach works out of the box via the watchdog tail. +* No service stack, no Docker, no collector. +* ccusage is actively maintained and has already solved the edge + cases in JSONL parsing (missing fields, renamed formats, cache + token math, cost per model). + +Cons: +* No free Grafana dashboard. If you want deep stats, either run + `ccusage` interactively or build something. +* Node on the runtime path. +* JSONL format is an implementation detail; upstream changes could + break parsing. ccusage tracks these but there's a lag window. +* Does not generalise if other machines also run Claude Code; each + one needs its own daemon. + +--- + +# Hardware (shared by A and B) + +## Movement + +**Switec X27.168** automotive stepper motor. 315-degree sweep, 600 +steps, roughly 2 degrees / step. ~$8 each. Used in car dashboards, +so enclosures and bezels exist off the shelf. + +Related cousins: X25, VID28, VID29, BKA30D-R5. The library supports +all of them, but X27.168 has the longest sweep and the most +available tutorials. + +## Driver + +`SwitecX25` Arduino library (`clearwater/SwitecX25` on GitHub). Works +for X27.168 despite the name. Drives 4 GPIO pins per motor. No +external driver IC required for short wiring runs; use small +transistor arrays (ULN2003A) if you want cleaner current handling. + +No maintained MicroPython port exists. **Firmware is Arduino C++** +rather than MicroPython. Not the original plan, but the right trade. + +## Board + +**ESP32 DevKit** (generic). WiFi, enough GPIO for 3 steppers (12 +pins) plus 8 annunciator LEDs and a reset button. ~$8. + +Alternative: Raspberry Pi Pico W. Less toolchain overhead if you +prefer CircuitPython, but you'd still be hand-rolling the stepper +driver. + +## Wiring sketch + +``` +ESP32 DevKit + GPIO 13,14,27,26 --> X27.168 #1 (left fuel) + GPIO 25,33,32,35 --> X27.168 #2 (tach) + GPIO 34,39,36,22 --> X27.168 #3 (right fuel) + GPIO 21 --> OPUS LED (red) + GPIO 19 --> SONNET LED (amber) + GPIO 18 --> HAIKU LED (green) + GPIO 5 --> HOT LED (red, PWM for flashing) + GPIO 17 --> WARN LED (amber) + GPIO 16 --> STALL LED (blue) + GPIO 4 --> IDLE LED (green, pulses while daemon reachable) + GPIO 15 --> tactile reset button (pull-up) +``` + +220R resistors per LED. Use a separate 5V rail for the steppers if +you see brownouts when all three move at once; ESP32's 3V3 rail is +fine for signals but the motors pull more than the onboard regulator +likes. + +## Firmware structure + +``` +firmware/ + platformio.ini + src/ + main.cpp setup() + loop() + wifi.cpp connect + reconnect + gauge.cpp wraps SwitecX25; map pct 0..1 to 0..steps + annunciator.cpp LED state machine + poll.cpp HTTP GET /usage every 1s + config.h daemon URL, redline, thresholds +``` + +Poll loop: + +1. Every 1000ms, GET `http://:8080/usage`. +2. Parse JSON (ArduinoJson). +3. Set gauge targets: `tach.setTargetStep(map(rate_1m, 0, redline, 0, 600))`, likewise for fuels. +4. Update LED states from `hot/warn/stall/idle/last_model`. +5. `gauge.update()` runs the stepper every loop tick until it hits target. + +## Enclosure + +* Cream faces, hairline burgundy redline zone (matches quartermaster + palette if you want the house look). +* Brushed aluminium bezel; 3D-print + spray-paint is fine for V1. +* Annunciator row behind smoked acrylic so the LEDs only show when + lit. +* Desk-size footprint: roughly 180mm wide x 90mm tall for the cluster. + +--- + +# Phasing + +One phase per issue. No scope bleed between phases. + +| Phase | Deliverable | Architecture-agnostic? | +|---|---|---| +| A | Daemon prints five window values to stdout | No (A or B chosen before start) | +| B | `/usage` HTTP endpoint; curl from browser or another box | No | +| C | ESP32 firmware driving ONE needle (tach) from the daemon | Yes | +| D | Three needles plus annunciator row | Yes | +| E | Calibration period: tune ceilings and redline against real use | Yes | +| F | Enclosure V1 (printed), cabling, permanent install | Yes | +| G | (If A) Grafana dashboard wired in; (if B) pick a deep-stats path or decline | Diverges | +| H | Character metrics and cross-system correlations (em-dash counter, git correlation, quartermaster correlation) | Yes | + +Do not attempt Phase D before Phase C. Hardware integration is +where surprises land; start with one axis. + +--- + +# Recommendation (for Jeff's homelab) + +Architecture A. + +The homelab-as-enterprise framing is the deciding factor. OTEL is +the platform feature, Prometheus is already the right tool, Grafana +dashboard 25052 is a free deep-stats surface, and the architecture +generalises if other machines start running Claude Code. The 15s +scrape interval is the only real concession; if the tach feels +sluggish after Phase E, bolt the JSONL tail from B on top for the +tach path only. Hybrid. + +If you don't already run Prometheus in the homelab, B gets you to +a working needle sooner (Phase A ships same day). Migrate to A +later if OTEL becomes useful for other things. + +Either way, the firmware and cluster are identical. The architecture +choice is only about what the daemon reads. + +# Metrics brainstorm (for later phases) + +All derivable from OTEL (A) or from the JSONL directly (B). Not +wired into the primary cluster; land in Phase G or a future Grafana +panel. ### Cost and tokens - -* Total tokens by window (already primary) -* Breakdown: `input` / `output` / `cache_read` / `cache_creation` -* **Cache hit rate** = cache_read / (cache_read + cache_creation + input) -* **Cache savings** in dollars (cache reads are 10% of normal input cost) -* Cost per session at published per-model pricing -* Cost per day / week / month; projected month -* Opus / Sonnet / Haiku token split -* Server tool use: `web_search_requests`, `web_fetch_requests` per day -* Service tier distribution (standard vs priority) -* Ephemeral-1h vs ephemeral-5m cache split +* Cache hit rate and cache-savings dollar value. +* Cost per session at published pricing. +* Projected monthly spend. +* Opus / Sonnet / Haiku token split. +* Server tool use (web search / web fetch) counts. ### Time and rhythm - -* Sessions per day / per week, rolling average -* Session duration distribution -* Time-of-day heatmap (circadian work pattern) -* Day-of-week heatmap -* Longest continuous session on record -* Your think-time: gap between assistant-end and next user message -* Claude's work-time: user message to assistant complete -* Active vs idle ratio within sessions -* Streak tracking (consecutive days used) -* All-nighter detector (session crossing 2am local) +* Session count, duration distribution, time-of-day heatmap. +* Think-time (user idle) vs work-time (assistant active). +* Streak tracking; all-nighter detector. ### Work shape - -* **Thinking tokens** counted from content blocks of type `thinking` -* Thinking-to-output ratio ("cogitation index") -* Stop-reason distribution (`end_turn` / `tool_use` / `max_tokens`); - watch for rising `max_tokens` (responses getting cut off) -* Messages per session -* Tool calls per assistant response (parallelism indicator) -* User interrupt rate (sessions ending on cancel) -* Iteration count per task (assistant messages between two - consecutive user prompts as a proxy) +* Thinking-to-output ratio as a "cogitation index" gauge. +* Stop-reason distribution (watch rising `max_tokens`). +* Tool calls per assistant response (parallelism indicator). ### Tool usage - -* Top tools by count (Bash, Edit, Read, Grep, ...) -* Tool success vs failure rate -* Bash command distribution, parsed by root executable (git, python, - uv, ls, ...) -* File reads / edits / writes per session -* Hottest files by touch count across all sessions -* Agent / subagent counts (`isSidechain=true`), subagent depth -* Web search / web fetch counts +* Top tools by count. Bash-command root-executable distribution. +* File reads vs edits vs writes per session. +* Hottest files across all sessions. +* Agent / subagent counts (`isSidechain=true`). ### Project and context +* Tokens per project, last-active timestamp, dormant-project detector. -* Tokens per project (JSONL path encodes the project) -* Time per project -* Project switching rate within a session -* Dormant project detector (no activity in N days) -* Languages touched, derived from file extensions -* Last file edited per project (resume-where-you-left-off) +### Friction and quality +* Permission denial frequency. +* File-history-snapshot count per session. -### Friction and quality signals +### Character +* **Em-dash violation counter** against the CLAUDE.md rule. +* Most-used phrase by Claude vs by the user. +* Thank-you rate, "Dude, chill" detector. -* User message length distribution (one-word directives vs prose) -* Rough correction reflex count: user messages starting with "no", - "wrong", "stop", "actually" -* Permission denial frequency -* Retry / regenerate patterns -* File-history-snapshot count per session (checkpoint density) +### Cross-system +* Git correlation: commits produced, lines changed per token. +* Quartermaster correlation: budget-editing days vs Claude load. -### Character and fun +--- -* Em dash violation count in assistant text (per the CLAUDE.md rule, - a needle that reads "rule-break events this week") -* Emoji leakage count -* Most-used phrase by Claude in your transcripts -* Most-used phrase by you (captures your actual directive vocabulary) -* "Dude, chill" detector: count of explicit pushback against Claude -* Thank-you rate per session -* Silent sessions (ended without `/compact` or `/clear`) +# Next steps -### Cross-system correlations (bigger, later) - -* Git: commits produced per session, lines changed per token spent -* PRs / issues closed per session -* Quartermaster: correlate long Claude sessions with budget-editing - days (just for fun) - -## Phasing - -### Phase A — daemon MVP - -One issue. Tail JSONL, parse into structured events, maintain the -five primary windows, print them to stdout every second. No HTTP, -no hardware, no dashboard. Prove the numbers land correctly against -Claude Code's own `/usage` output. - -### Phase B — HTTP + dashboard overview - -Stand up FastAPI, expose `GET /usage` for the cluster and `GET /` -for a dashboard overview page (three gauges rendered in SVG, same -data as the cluster). No deep stats yet. First thing you can load -in a browser. - -### Phase C — firmware + single needle - -Minimal ESP32 / Pico firmware that polls `/usage` and drives one -needle (the tach). Prove the hardware path end to end. - -### Phase D — full cluster - -Two more needles + annunciator row. Enclosure prototype. - -### Phase E — dashboard deep stats - -Cost panel, model panel, project panel, rhythm heatmaps, tool -panel, file panel, raw-event tail. Pulls from the aggregate store -the daemon has been building since Phase A. - -### Phase F — character and cross-system - -Em-dash detector, phrase extractor, git correlation, quartermaster -correlation. Lowest priority, highest amusement. - -## Next steps - -1. Forgejo repo at `archeious/claude-gauge` (created alongside - this plan). -2. File Phase A as the first issue. -3. Ship Phase A. See the five numbers tick up in a terminal while - typing into Claude Code. First dopamine hit. -4. File follow-up issues one phase at a time. No scope bleed - between phases. +1. Decide A or B. Default: A. +2. File Phase A as the first issue on `archeious/claude-gauge`. +3. If A: stand up the Compose stack, point Claude Code at it, + verify metrics reach Prometheus via the `/api/v1/query` browser + interface. +4. If B: install `ccusage`, run `blocks --json` and `daily --json` + by hand, paste the outputs somewhere durable for reference. +5. Ship Phase A. See the numbers tick in a terminal. diff --git a/README.md b/README.md index ef8ebed..bcbf198 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,31 @@ # claude-gauge -Hardware instrument cluster plus companion web dashboard for Claude -Code session telemetry. +Hardware instrument cluster displaying Claude Code session telemetry. -A local Python daemon tails `~/.claude/projects/**/*.jsonl`, parses -structured events into rolling windows and aggregate stats, and -exposes them over HTTP. An ESP32 or Pi Pico drives a three-gauge -analog cluster on the desk (tokens/min tach, 5h fuel, 7d fuel) with -an annunciator row of model and warning lamps. A browser dashboard -on the same port surfaces the deep stats. +Three analog needle gauges (5h fuel, tokens/min tach, 7d fuel) plus +an annunciator row of model-indicator and warning lamps, driven by +an ESP32 polling a local Python daemon. The daemon reads either +Claude Code's native OpenTelemetry feed through a Prometheus stack +(architecture A) or `ccusage` CLI aggregates with a direct JSONL +tail for the tach (architecture B). Firmware and cluster are +identical across both. -See [PLAN.md](PLAN.md) for architecture, the instrument cluster -layout, the dashboard sections, the full metrics brainstorm, and -the phasing plan. +Fighter-jet / race-car aesthetic. Physical-first: all the deep +stats live in Grafana (A) or ccusage's own surfaces (B). The dial +on the desk is the ambient summary. + +See [PLAN.md](PLAN.md) for: + +* Instrument cluster layout and annunciator semantics +* Architecture A: Docker Compose stack, Claude Code env config, + PromQL queries, daemon sketch +* Architecture B: ccusage subprocess integration, watchdog tail, + daemon sketch +* Hardware: X27.168 steppers, SwitecX25 library, ESP32 wiring, + enclosure notes +* Six-phase plan from daemon MVP through enclosure V1 +* Full metrics brainstorm for later phases ## Status -Scaffolded. Phase A (daemon MVP) is the first issue. +Scaffolded. Phase A pending architecture decision and first issue.