# claude-gauge Hardware instrument cluster displaying Claude Code session telemetry. Three analog needle gauges plus an annunciator row, driven by an ESP32 polling a local daemon, driven by Claude Code's own OpenTelemetry feed or by `ccusage`. Fighter-jet / race-car aesthetic. Physical-first. ## Why Watching tokens burn against the Max-plan windows is useful, but the same data also tells you when Claude is grinding, which model just ran, and how warm your cache is. A dial on the desk makes that ambient instead of tab-switching. ## Prior art and the decision it implies Software side is crowded. `ccusage`, Claude-Code-Usage-Monitor, haasonsaas/claude-usage-tracker, phuryn/claude-usage, multiple Grafana dashboards (Grafana Labs 25052 and 24993), and Anthropic's own `claude-code-monitoring-guide` repo all do the JSONL parsing and rolling-window math already. Claude Code ships with native OpenTelemetry support. The physical-gauge angle has no extant prior art. Implication: do not rebuild the telemetry layer. Consume it. Spend the love on the hardware and the adapter that bridges it. ## Instrument cluster (same in all architectures) ``` +------------+ +--------------+ +------------+ | 5h FUEL | | TOKENS/MIN | | 7d FUEL | | 0 - 100% | | 0 - redline | | 0 - 100% | +------------+ +--------------+ +------------+ [OPUS] [SONNET] [HAIKU] [HOT] [WARN] [STALL] [IDLE] ``` | Gauge | Metric | |---|---| | Center tach | Tokens/min, rolling short window | | Left fuel | % of 5h plan window used | | Right fuel | % of 7d plan window used | | Lamp | Condition | |---|---| | OPUS / SONNET / HAIKU | colour-coded model that emitted the most recent tokens | | HOT | tach above redline | | WARN | either fuel gauge above 80% | | STALL | no telemetry in last N minutes | | IDLE | daemon reachable, no activity | ## Two architectures Pick one. Both feed the same firmware and cluster. | | A. OTEL-native | B. ccusage-sourced | |---|---|---| | Data source | Claude Code OTLP -> collector -> Prometheus | Local JSONL via `ccusage` CLI | | External deps | Docker Compose stack (collector, Prometheus, Grafana) | Node + `npx ccusage` | | Deep-stats dashboard | Grafana dashboard 25052 for free | Build nothing, ccusage has a TUI | | Short-window tach | Limited by Prometheus scrape interval (15s) | Hybrid JSONL tail gives sub-second | | Operational weight | Moderate (3 services) | Tiny (one subprocess) | | Homelab-enterprise fit | Strong | Weak | | Time to first needle | Day 2 | Day 1 | | Survivability through Claude Code updates | High (OTEL schema is stable and documented) | Medium (JSONL layout is an implementation detail) | Both share the firmware, cluster, and enclosure. The daemon is the only thing that differs. The `/usage` HTTP shape is identical across A and B so the firmware never knows which backend is wired up. --- # Architecture A: OTEL-native ## Stack Mirror Anthropic's reference stack (`claude-code-monitoring-guide`). Three containers. ```yaml # docker-compose.yml services: otel-collector: image: otel/opentelemetry-collector-contrib:latest command: ["--config=/etc/otel-collector-config.yaml"] volumes: - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml ports: - "4317:4317" # OTLP gRPC - "4318:4318" # OTLP HTTP - "8889:8889" # Prometheus scrape depends_on: - prometheus prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prometheus_data:/prometheus command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--storage.tsdb.retention.time=8d' # > 7d so increase() works - '--web.enable-lifecycle' grafana: image: grafana/grafana:latest ports: - "3000:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=admin volumes: - grafana_data:/var/lib/grafana - ./grafana/provisioning:/etc/grafana/provisioning - ./grafana/dashboards:/var/lib/grafana/dashboards depends_on: - prometheus volumes: prometheus_data: grafana_data: ``` ```yaml # otel-collector-config.yaml receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: batch: timeout: 1s send_batch_size: 1024 memory_limiter: check_interval: 1s limit_mib: 512 exporters: prometheus: endpoint: "0.0.0.0:8889" send_timestamps: true metric_expiration: 192h # 8 days, covers 7d window enable_open_metrics: true service: pipelines: metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [prometheus] ``` ```yaml # prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'otel-collector' static_configs: - targets: ['otel-collector:8889'] ``` Import Grafana Labs dashboard **25052** ("Claude Code") against the Prometheus data source. That is the deep-stats dashboard; no custom web UI needed. ## Claude Code configuration Set in the shell Claude Code runs in (user profile, systemd unit, or `~/.claude/settings.json` managed settings): ```bash export CLAUDE_CODE_ENABLE_TELEMETRY=1 export OTEL_METRICS_EXPORTER=otlp export OTEL_LOGS_EXPORTER=otlp export OTEL_EXPORTER_OTLP_PROTOCOL=grpc export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 export OTEL_METRIC_EXPORT_INTERVAL=10000 # 10s for gauge responsiveness export OTEL_METRICS_INCLUDE_SESSION_ID=false # bound cardinality ``` ## Metrics Claude Code emits (via OTEL, surfaced in Prometheus) All prefixed `claude_code_` after the OTEL-to-Prom conversion. | Prometheus metric | Labels | |---|---| | `claude_code_token_usage_tokens_total` | `type` (`input`/`output`/`cacheRead`/`cacheCreation`), `model` | | `claude_code_cost_usage_USD_total` | `model` | | `claude_code_session_count_total` | | | `claude_code_active_time_total_seconds_total` | `type` (`user`/`cli`) | | `claude_code_lines_of_code_count_total` | `type` (`added`/`removed`) | | `claude_code_commit_count_total` | | | `claude_code_pull_request_count_total` | | | `claude_code_code_edit_tool_decision_count_total` | `tool_name`, `decision`, `language` | Events (via OTEL logs) carry richer per-request context including `prompt.id`, `duration_ms`, `speed` (fast/normal), etc. Not needed for the primary gauges. ## PromQL the daemon runs ```promql # Tokens/min, short rolling window (tach) sum(rate(claude_code_token_usage_tokens_total[1m])) * 60 # 5h window sum (left fuel) sum(increase(claude_code_token_usage_tokens_total[5h])) # 7d window sum (right fuel) sum(increase(claude_code_token_usage_tokens_total[7d])) # Cache hit rate (optional sub-gauge) sum(rate(claude_code_token_usage_tokens_total{type="cacheRead"}[5m])) / sum(rate(claude_code_token_usage_tokens_total{type=~"input|cacheRead|cacheCreation"}[5m])) # Last model (approximation via max-sample lookup) topk(1, claude_code_token_usage_tokens_total{type="output"}) # Cost estimates sum(increase(claude_code_cost_usage_USD_total[5h])) sum(increase(claude_code_cost_usage_USD_total[7d])) # Stall detection (no tokens in last N minutes) absent(rate(claude_code_token_usage_tokens_total[2m]) > 0) ``` ## Daemon (A) Thin Python service. Queries Prometheus, transforms to `/usage` payload for the firmware. ``` src/claude_gauge/ __init__.py daemon_prom.py FastAPI app, PromQL queries, /usage endpoint config.py Prometheus URL, ceilings, stall threshold windows.py PromQL builders and result parsing calibration.py Maps raw values to firmware-friendly 0-1000 scales ``` ```python # daemon_prom.py sketch import os import httpx from fastapi import FastAPI PROM = os.environ.get("CLAUDE_GAUGE_PROM_URL", "http://localhost:9090") CEIL_5H = int(os.environ.get("CLAUDE_GAUGE_5H_CEILING", 500_000)) CEIL_7D = int(os.environ.get("CLAUDE_GAUGE_7D_CEILING", 3_000_000)) RED = int(os.environ.get("CLAUDE_GAUGE_TACH_REDLINE", 8000)) # tokens/min app = FastAPI() client = httpx.AsyncClient(timeout=5.0) async def prom(q: str) -> float: r = await client.get(f"{PROM}/api/v1/query", params={"query": q}) data = r.json()["data"]["result"] return float(data[0]["value"][1]) if data else 0.0 @app.get("/usage") async def usage(): rate_1m = await prom("sum(rate(claude_code_token_usage_tokens_total[1m])) * 60") win_5h = await prom("sum(increase(claude_code_token_usage_tokens_total[5h]))") win_7d = await prom("sum(increase(claude_code_token_usage_tokens_total[7d]))") cache = await prom( 'sum(rate(claude_code_token_usage_tokens_total{type="cacheRead"}[5m])) / ' 'sum(rate(claude_code_token_usage_tokens_total{type=~"input|cacheRead|cacheCreation"}[5m]))' ) stalled = (await prom( 'sum(rate(claude_code_token_usage_tokens_total[2m]))' )) == 0.0 return { "rate_1m": rate_1m, "window_5h_tokens": win_5h, "window_5h_pct": min(1.0, win_5h / CEIL_5H), "window_7d_tokens": win_7d, "window_7d_pct": min(1.0, win_7d / CEIL_7D), "cache_hit_rate": cache, "hot": rate_1m > RED, "warn": (win_5h / CEIL_5H) > 0.8 or (win_7d / CEIL_7D) > 0.8, "stall": stalled, "idle": True, "last_model": await last_model(), } ``` `last_model` needs one extra query that picks the `model` label of the most recently incremented output-token series. Implementation detail; simplest is to run a small query loop on metric labels. ## Dependencies (A) ```toml # pyproject.toml additions dependencies = [ "fastapi>=0.136.0", "uvicorn[standard]>=0.44.0", "httpx>=0.28.1", ] ``` No SQLite, no watchdog, no ORM. Prometheus is the database. ## Retention considerations * Collector `metric_expiration: 192h` keeps a metric visible for 8d after its last sample, so 7d `increase()` queries work even on intermittent sessions. * Prometheus `--storage.tsdb.retention.time=8d` keeps the samples long enough for the same 7d queries. * Grafana dashboard 25052 pulls from the same Prometheus. ## Pros and cons of A Pros: * Uses the platform feature Anthropic ships. * Grafana dashboard is free. * Metric schema is documented and stable. * Plays cleanly with any other homelab metrics already in Prometheus. * Architecture translates without changes when other machines run Claude Code too: point their OTLP endpoint at the same collector. Cons: * Prometheus scrape interval caps tach responsiveness at ~15s. * Three containers to run. * Requires env-var changes on every Claude Code launch surface. ## Tach responsiveness mitigation (A) If the 15s cap bothers you, the daemon can keep a tiny JSONL-tail fallback just for the tach. Same code shape as architecture B's tach component; described below. Pulling the fuel gauges and everything else from Prometheus, tach from direct file tail, is a clean hybrid. Only activate if Phase C shows the needle feels sluggish. --- # Architecture B: ccusage-sourced ## Stack One process: `ccusage` as a long-lived subprocess or periodic shell call. No collector, no Prometheus, no Grafana. A hybrid watchdog tail handles the sub-second tach that ccusage's aggregate API can't. ``` [ Claude Code ] -> ~/.claude/projects/**/*.jsonl | +---+----------------+ | | v v [ watchdog tail ] [ ccusage CLI / MCP ] (short-window tach) (5h blocks, 7d daily) | | +----------+---------+ v [ claude-gauge daemon ] GET /usage | v ESP32 firmware ``` ## ccusage integration options Two shapes work. Pick one, not both. ### Option B1: periodic CLI subprocess (simplest) ```bash npx ccusage@latest blocks --json # current 5h block npx ccusage@latest daily --json # per-day aggregates for 7d sum ``` Run every ~10s from the daemon. Parse JSON, fill the fuel gauges. ### Option B2: ccusage MCP HTTP server (persistent) ```bash bunx @ccusage/mcp@latest --type http --port 8080 ``` Exposes a Hono app at `POST /` handling MCP StreamableHTTP requests. Four registered tools: | Tool | Description | |---|---| | `daily` | Usage grouped by date | | `monthly` | Usage grouped by month | | `session` | Usage grouped by conversation session | | `blocks` | Usage grouped by 5-hour session billing blocks | Each tool accepts `since`, `until`, `mode`, `timezone`, `locale` and returns JSON in an MCP text content block. Invoke as an MCP client from the daemon (`mcp` Python SDK) or as raw JSON-RPC to `POST /`. ### Recommendation **B1**. The CLI path is simpler, has fewer moving parts, and the performance hit of a subprocess call every 10s is negligible. Switch to B2 only if you also want the MCP surface exposed to other local agents (Claude Code can already consume ccusage's MCP). ## Short-window tach via watchdog ccusage aggregates are too coarse for the tach. The daemon keeps its own 60-second ring buffer by tailing JSONL directly. ```python from watchdog.observers import Observer from watchdog.events import FileSystemEventHandler from collections import deque from pathlib import Path import json, time class JsonlTail(FileSystemEventHandler): def __init__(self, bus): self.bus = bus self.offsets: dict[Path, int] = {} def on_modified(self, event): p = Path(event.src_path) if p.suffix != ".jsonl": return off = self.offsets.get(p, 0) with p.open() as f: f.seek(off) for line in f: try: d = json.loads(line) except json.JSONDecodeError: continue if d.get("type") == "assistant": u = d.get("message", {}).get("usage", {}) tokens = sum(u.get(k, 0) for k in ( "input_tokens", "output_tokens", "cache_read_input_tokens", "cache_creation_input_tokens", )) model = d.get("message", {}).get("model", "") self.bus.push(time.time(), tokens, model) self.offsets[p] = f.tell() class RateBus: def __init__(self, window_s=60): self.window_s = window_s self.buf: deque[tuple[float, int, str]] = deque() def push(self, ts, tokens, model): self.buf.append((ts, tokens, model)) self._evict() def _evict(self): cutoff = time.time() - self.window_s while self.buf and self.buf[0][0] < cutoff: self.buf.popleft() def rate_per_min(self): self._evict() return sum(t for _, t, _ in self.buf) def last_model(self): return self.buf[-1][2] if self.buf else None ``` ## Daemon (B) ``` src/claude_gauge/ __init__.py daemon_ccusage.py FastAPI app, ccusage subprocess calls, /usage tail.py watchdog + RateBus for tach config.py calibration.py ``` ```python # daemon_ccusage.py sketch import asyncio, json, os, subprocess from fastapi import FastAPI from .tail import RateBus, start_watcher CEIL_5H = int(os.environ.get("CLAUDE_GAUGE_5H_CEILING", 500_000)) CEIL_7D = int(os.environ.get("CLAUDE_GAUGE_7D_CEILING", 3_000_000)) RED = int(os.environ.get("CLAUDE_GAUGE_TACH_REDLINE", 8000)) bus = RateBus(window_s=60) start_watcher(bus) # background thread app = FastAPI() async def ccusage(cmd: str) -> dict: proc = await asyncio.create_subprocess_exec( "npx", "ccusage@latest", cmd, "--json", stdout=asyncio.subprocess.PIPE, ) out, _ = await proc.communicate() return json.loads(out) async def current_5h_tokens() -> int: blocks = await ccusage("blocks") cur = next((b for b in blocks.get("blocks", []) if b.get("isActive")), None) return cur["totalTokens"] if cur else 0 async def trailing_7d_tokens() -> int: daily = await ccusage("daily") # sum last 7 daily buckets rows = daily.get("daily", [])[-7:] return sum(r["totalTokens"] for r in rows) @app.get("/usage") async def usage(): rate = bus.rate_per_min() w5, w7 = await asyncio.gather(current_5h_tokens(), trailing_7d_tokens()) return { "rate_1m": rate, "window_5h_tokens": w5, "window_5h_pct": min(1.0, w5 / CEIL_5H), "window_7d_tokens": w7, "window_7d_pct": min(1.0, w7 / CEIL_7D), "hot": rate > RED, "warn": (w5 / CEIL_5H) > 0.8 or (w7 / CEIL_7D) > 0.8, "stall": rate == 0 and not bus.buf, "idle": True, "last_model": bus.last_model(), } ``` Cache `ccusage blocks/daily` output with a 10s TTL so the `/usage` endpoint stays cheap when the firmware polls at 1 Hz. ## Dependencies (B) ```toml dependencies = [ "fastapi>=0.136.0", "uvicorn[standard]>=0.44.0", "watchdog>=5.0.0", ] ``` Node needs to be on the PATH for `npx ccusage@latest`. Pin a version in config rather than using `@latest` once the daemon is past Phase A. ## Pros and cons of B Pros: * Single process, one dependency tree. * Sub-second tach works out of the box via the watchdog tail. * No service stack, no Docker, no collector. * ccusage is actively maintained and has already solved the edge cases in JSONL parsing (missing fields, renamed formats, cache token math, cost per model). Cons: * No free Grafana dashboard. If you want deep stats, either run `ccusage` interactively or build something. * Node on the runtime path. * JSONL format is an implementation detail; upstream changes could break parsing. ccusage tracks these but there's a lag window. * Does not generalise if other machines also run Claude Code; each one needs its own daemon. --- # Hardware (shared by A and B) ## Movement **Switec X27.168** automotive stepper motor. 315-degree sweep, 600 steps, roughly 2 degrees / step. ~$8 each. Used in car dashboards, so enclosures and bezels exist off the shelf. Related cousins: X25, VID28, VID29, BKA30D-R5. The library supports all of them, but X27.168 has the longest sweep and the most available tutorials. ## Driver `SwitecX25` Arduino library (`clearwater/SwitecX25` on GitHub). Works for X27.168 despite the name. Drives 4 GPIO pins per motor. No external driver IC required for short wiring runs; use small transistor arrays (ULN2003A) if you want cleaner current handling. No maintained MicroPython port exists. **Firmware is Arduino C++** rather than MicroPython. Not the original plan, but the right trade. ## Board **ESP32 DevKit** (generic). WiFi, enough GPIO for 3 steppers (12 pins) plus 8 annunciator LEDs and a reset button. ~$8. Alternative: Raspberry Pi Pico W. Less toolchain overhead if you prefer CircuitPython, but you'd still be hand-rolling the stepper driver. ## Wiring sketch ``` ESP32 DevKit GPIO 13,14,27,26 --> X27.168 #1 (left fuel) GPIO 25,33,32,35 --> X27.168 #2 (tach) GPIO 34,39,36,22 --> X27.168 #3 (right fuel) GPIO 21 --> OPUS LED (red) GPIO 19 --> SONNET LED (amber) GPIO 18 --> HAIKU LED (green) GPIO 5 --> HOT LED (red, PWM for flashing) GPIO 17 --> WARN LED (amber) GPIO 16 --> STALL LED (blue) GPIO 4 --> IDLE LED (green, pulses while daemon reachable) GPIO 15 --> tactile reset button (pull-up) ``` 220R resistors per LED. Use a separate 5V rail for the steppers if you see brownouts when all three move at once; ESP32's 3V3 rail is fine for signals but the motors pull more than the onboard regulator likes. ## Firmware structure ``` firmware/ platformio.ini src/ main.cpp setup() + loop() wifi.cpp connect + reconnect gauge.cpp wraps SwitecX25; map pct 0..1 to 0..steps annunciator.cpp LED state machine poll.cpp HTTP GET /usage every 1s config.h daemon URL, redline, thresholds ``` Poll loop: 1. Every 1000ms, GET `http://:8080/usage`. 2. Parse JSON (ArduinoJson). 3. Set gauge targets: `tach.setTargetStep(map(rate_1m, 0, redline, 0, 600))`, likewise for fuels. 4. Update LED states from `hot/warn/stall/idle/last_model`. 5. `gauge.update()` runs the stepper every loop tick until it hits target. ## Enclosure * Cream faces, hairline burgundy redline zone (matches quartermaster palette if you want the house look). * Brushed aluminium bezel; 3D-print + spray-paint is fine for V1. * Annunciator row behind smoked acrylic so the LEDs only show when lit. * Desk-size footprint: roughly 180mm wide x 90mm tall for the cluster. --- # Phasing One phase per issue. No scope bleed between phases. | Phase | Deliverable | Architecture-agnostic? | |---|---|---| | A | Daemon prints five window values to stdout | No (A or B chosen before start) | | B | `/usage` HTTP endpoint; curl from browser or another box | No | | C | ESP32 firmware driving ONE needle (tach) from the daemon | Yes | | D | Three needles plus annunciator row | Yes | | E | Calibration period: tune ceilings and redline against real use | Yes | | F | Enclosure V1 (printed), cabling, permanent install | Yes | | G | (If A) Grafana dashboard wired in; (if B) pick a deep-stats path or decline | Diverges | | H | Character metrics and cross-system correlations (em-dash counter, git correlation, quartermaster correlation) | Yes | Do not attempt Phase D before Phase C. Hardware integration is where surprises land; start with one axis. --- # Recommendation (for Jeff's homelab) Architecture A. The homelab-as-enterprise framing is the deciding factor. OTEL is the platform feature, Prometheus is already the right tool, Grafana dashboard 25052 is a free deep-stats surface, and the architecture generalises if other machines start running Claude Code. The 15s scrape interval is the only real concession; if the tach feels sluggish after Phase E, bolt the JSONL tail from B on top for the tach path only. Hybrid. If you don't already run Prometheus in the homelab, B gets you to a working needle sooner (Phase A ships same day). Migrate to A later if OTEL becomes useful for other things. Either way, the firmware and cluster are identical. The architecture choice is only about what the daemon reads. # Metrics brainstorm (for later phases) All derivable from OTEL (A) or from the JSONL directly (B). Not wired into the primary cluster; land in Phase G or a future Grafana panel. ### Cost and tokens * Cache hit rate and cache-savings dollar value. * Cost per session at published pricing. * Projected monthly spend. * Opus / Sonnet / Haiku token split. * Server tool use (web search / web fetch) counts. ### Time and rhythm * Session count, duration distribution, time-of-day heatmap. * Think-time (user idle) vs work-time (assistant active). * Streak tracking; all-nighter detector. ### Work shape * Thinking-to-output ratio as a "cogitation index" gauge. * Stop-reason distribution (watch rising `max_tokens`). * Tool calls per assistant response (parallelism indicator). ### Tool usage * Top tools by count. Bash-command root-executable distribution. * File reads vs edits vs writes per session. * Hottest files across all sessions. * Agent / subagent counts (`isSidechain=true`). ### Project and context * Tokens per project, last-active timestamp, dormant-project detector. ### Friction and quality * Permission denial frequency. * File-history-snapshot count per session. ### Character * **Em-dash violation counter** against the CLAUDE.md rule. * Most-used phrase by Claude vs by the user. * Thank-you rate, "Dude, chill" detector. ### Cross-system * Git correlation: commits produced, lines changed per token. * Quartermaster correlation: budget-editing days vs Claude load. --- # Next steps 1. Decide A or B. Default: A. 2. File Phase A as the first issue on `archeious/claude-gauge`. 3. If A: stand up the Compose stack, point Claude Code at it, verify metrics reach Prometheus via the `/api/v1/query` browser interface. 4. If B: install `ccusage`, run `blocks --json` and `daily --json` by hand, paste the outputs somewhere durable for reference. 5. Ship Phase A. See the numbers tick in a terminal.