diff --git a/PLAN.md b/PLAN.md
index 633695d..9e21da0 100644
--- a/PLAN.md
+++ b/PLAN.md
@@ -1,30 +1,31 @@
 # claude-gauge
 
-Hardware instrument cluster plus companion web dashboard for Claude
-Code session telemetry. Primary surface is a physical cluster on the
-desk (fighter-jet / race-car aesthetic). Secondary surface is a
-local web dashboard for the deep stats.
+Hardware instrument cluster displaying Claude Code session telemetry.
+Three analog needle gauges plus an annunciator row, driven by an ESP32
+polling a local daemon, driven by Claude Code's own OpenTelemetry feed
+or by `ccusage`. Fighter-jet / race-car aesthetic. Physical-first.
 
-## Problem
+## Why
 
-There is no official programmatic API for Claude Max plan usage
-today. The `/usage` and `/status` slash commands in Claude Code are
-interactive-only. Open feature requests (claude-code issues #40395,
-#13585, #27217, #33978) track this but nothing has shipped as of
-2026-04-17.
+Watching tokens burn against the Max-plan windows is useful, but the
+same data also tells you when Claude is grinding, which model just
+ran, and how warm your cache is. A dial on the desk makes that
+ambient instead of tab-switching.
 
-Rather than wait, drive everything from local Claude Code state by
-watching the session transcript JSONL files in
-`~/.claude/projects/`. Swap the data source later when Anthropic
-ships a real endpoint; the hardware, firmware, daemon API, and
-dashboard all stay the same.
+## Prior art and the decision it implies
 
-## Instrument cluster (primary surface)
+Software side is crowded. `ccusage`, Claude-Code-Usage-Monitor,
+haasonsaas/claude-usage-tracker, phuryn/claude-usage, multiple
+Grafana dashboards (Grafana Labs 25052 and 24993), and Anthropic's
+own `claude-code-monitoring-guide` repo all do the JSONL parsing and
+rolling-window math already. Claude Code ships with native
+OpenTelemetry support. The physical-gauge angle has no extant
+prior art.
 
-Physical cluster on the desk. Three analog gauges across the top,
-annunciator row below. Backlit. Brushed aluminium bezel, black face,
-cream needles, burgundy redline zone (optional: match the
-quartermaster palette for a house aesthetic).
+Implication: do not rebuild the telemetry layer. Consume it. Spend
+the love on the hardware and the adapter that bridges it.
+
+## Instrument cluster (same in all architectures)
 
 ```
     +------------+   +--------------+   +------------+
@@ -34,309 +35,724 @@ quartermaster palette for a house aesthetic).
     [OPUS] [SONNET] [HAIKU]   [HOT] [WARN] [STALL] [IDLE]
 ```
 
-### Primary gauges
-
-| Gauge | Input | Feel |
-|---|---|---|
-| Center tach | tokens/min, short rolling window | Jumpy, fun, shows when Claude is cooking |
-| Left fuel | % of 5h plan window used | Slow, steady, tells you when to worry |
-| Right fuel | % of 7d plan window used | Slowest, long-arc view of the week |
-
-### Annunciator row
+| Gauge | Metric |
+|---|---|
+| Center tach | Tokens/min, rolling short window |
+| Left fuel | % of 5h plan window used |
+| Right fuel | % of 7d plan window used |
 
 | Lamp | Condition |
 |---|---|
-| OPUS / SONNET / HAIKU | lights the model that wrote the most recent tokens |
-| HOT | flashes when tach crosses redline |
-| WARN | solid when either fuel gauge is above 80% |
-| STALL | lights after N minutes of JSONL silence (no activity) |
-| IDLE | green "power on, daemon reachable" indicator |
+| OPUS / SONNET / HAIKU | colour-coded model that emitted the most recent tokens |
+| HOT | tach above redline |
+| WARN | either fuel gauge above 80% |
+| STALL | no telemetry in last N minutes |
+| IDLE | daemon reachable, no activity |
 
-Model lamps are colour-coded: Opus deep red, Sonnet amber, Haiku
-green. Visual cue for why the tach just spiked.
+## Two architectures
 
-### Optional fourth gauge
+Pick one. Both feed the same firmware and cluster.
 
-If physical real estate allows, a "temp" or "boost" sub-gauge:
+| | A. OTEL-native | B. ccusage-sourced |
+|---|---|---|
+| Data source | Claude Code OTLP -> collector -> Prometheus | Local JSONL via `ccusage` CLI |
+| External deps | Docker Compose stack (collector, Prometheus, Grafana) | Node + `npx ccusage` |
+| Deep-stats dashboard | Grafana dashboard 25052 for free | Build nothing, ccusage has a TUI |
+| Short-window tach | Limited by Prometheus scrape interval (15s) | Hybrid JSONL tail gives sub-second |
+| Operational weight | Moderate (3 services) | Tiny (one subprocess) |
+| Homelab-enterprise fit | Strong | Weak |
+| Time to first needle | Day 2 | Day 1 |
+| Survivability through Claude Code updates | High (OTEL schema is stable and documented) | Medium (JSONL layout is an implementation detail) |
 
-* **Cache hit rate** as a boost gauge. High cache read = cheap
-  inference. Bragging-rights needle.
-* **Thinking-to-output ratio** as a temp gauge. High = Claude is
-  grinding; low = cruising. Tells you when a task is hard.
+Both share the firmware, cluster, and enclosure. The daemon is the
+only thing that differs. The `/usage` HTTP shape is identical across
+A and B so the firmware never knows which backend is wired up.
 
-## Companion web dashboard (secondary surface)
+---
 
-Same daemon serves a browser dashboard at `http://<host>:<port>/`
-with the deep stats. Claude Code is already a terminal tool, so the
-dashboard lives alongside as a geek-out surface when you want to
-drill in past the three-needle summary.
+# Architecture A: OTEL-native
 
-### Dashboard sections
+## Stack
 
-1. **Overview**: the three gauges rendered in the browser, plus the
-   annunciator row. Same data as the cluster, so a quick sanity
-   check from any device on the LAN.
-2. **Rates and windows**: line charts for tokens/min over last 1h,
-   6h, 24h. 5h and 7d window sums over the past month.
-3. **Cost**: dollar estimates derived from token counts and
-   published per-model pricing. Today, week-to-date, month-to-date,
-   projected month.
-4. **Models**: stacked time series by model (Opus / Sonnet / Haiku).
-   Token split pie for the current 7d window.
-5. **Projects**: tokens per project, time per project, last-active
-   timestamp. Sortable table.
-6. **Tools**: top tools by call count, success rate per tool,
-   Bash command distribution (top commands by root executable).
-7. **Files**: hottest files by Read + Edit count, edit-to-read
-   ratio per file.
-8. **Rhythm**: time-of-day heatmap, day-of-week heatmap, session
-   duration distribution.
-9. **Raw events**: streaming tail of the latest parsed JSONL rows,
-   for debugging and to watch the data land.
+Mirror Anthropic's reference stack (`claude-code-monitoring-guide`).
+Three containers.
 
-Dashboard is read-only. Nothing the user does here mutates state.
+```yaml
+# docker-compose.yml
+services:
+  otel-collector:
+    image: otel/opentelemetry-collector-contrib:latest
+    command: ["--config=/etc/otel-collector-config.yaml"]
+    volumes:
+      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
+    ports:
+      - "4317:4317"   # OTLP gRPC
+      - "4318:4318"   # OTLP HTTP
+      - "8889:8889"   # Prometheus scrape
+    depends_on:
+      - prometheus
 
-## Architecture
+  prometheus:
+    image: prom/prometheus:latest
+    ports:
+      - "9090:9090"
+    volumes:
+      - ./prometheus.yml:/etc/prometheus/prometheus.yml
+      - prometheus_data:/prometheus
+    command:
+      - '--config.file=/etc/prometheus/prometheus.yml'
+      - '--storage.tsdb.path=/prometheus'
+      - '--storage.tsdb.retention.time=8d'   # > 7d so increase() works
+      - '--web.enable-lifecycle'
 
-```
-~/.claude/projects/**/*.jsonl
-          |
-          v (watchdog / inotify, line-append events)
-          |
-  [ gauge daemon ]  <--- local, Python, long-running
-          |
-          +-- parse new line -> structured event
-          |     { ts, session_id, project, model, role,
-          |       input_tokens, output_tokens,
-          |       cache_read, cache_creation,
-          |       thinking_tokens, stop_reason,
-          |       tool_use: [...], is_sidechain, request_id }
-          |
-          +-- in-memory ring buffer (last ~60 min of events)
-          +-- periodic flush of events + rolling aggregates to SQLite
-          |
-          +-- computed windows (primary, cluster):
-          |     rate_instant    last 30s, tokens/min
-          |     rate_1m         rolling 60s, tokens/min
-          |     rate_5m         rolling 5m, tokens/min
-          |     window_5h       rolling 5h sum
-          |     window_7d       rolling 7d sum
-          |
-          +-- computed aggregates (secondary, dashboard):
-          |     cache hit rate, thinking ratio, cost estimate,
-          |     per-model token split, per-project totals,
-          |     tool call counts, file touch counts, rhythm grids
-          |
-          +-- derived flags:
-          |     hot, warn, stall, idle, last_model, last_project
-          |
-          +-- HTTP endpoints:
-          |     GET  /usage        primary cluster payload
-          |     GET  /stats        deep aggregates for the dashboard
-          |     GET  /stream       SSE of parsed events (raw tail)
-          |     GET  /             dashboard UI
-          |
-          v (poll every ~1s)
-     [ ESP32 / Pi Pico, on LAN ]
-          |
-          +-- drives three needles via PWM + low-pass filter
-              (or servos if full-swing dials are easier)
-          +-- drives annunciator LEDs over GPIO
-          +-- optional small OLED behind the cluster for raw numbers
+  grafana:
+    image: grafana/grafana:latest
+    ports:
+      - "3000:3000"
+    environment:
+      - GF_SECURITY_ADMIN_PASSWORD=admin
+    volumes:
+      - grafana_data:/var/lib/grafana
+      - ./grafana/provisioning:/etc/grafana/provisioning
+      - ./grafana/dashboards:/var/lib/grafana/dashboards
+    depends_on:
+      - prometheus
+
+volumes:
+  prometheus_data:
+  grafana_data:
 ```
 
-## Real-time honesty
+```yaml
+# otel-collector-config.yaml
+receivers:
+  otlp:
+    protocols:
+      grpc:
+        endpoint: 0.0.0.0:4317
+      http:
+        endpoint: 0.0.0.0:4318
 
-Claude Code writes a JSONL line **after** each assistant message
-completes, not during streaming. The cluster therefore updates
-per-message, not per-token:
+processors:
+  batch:
+    timeout: 1s
+    send_batch_size: 1024
+  memory_limiter:
+    check_interval: 1s
+    limit_mib: 512
 
-* Rapid back-and-forth with small messages: tach ticks steadily,
-  feels alive.
-* Opus cranking a 40-second response: tach reads zero, zero, zero,
-  then SPIKES at the end with the full message's token count.
+exporters:
+  prometheus:
+    endpoint: "0.0.0.0:8889"
+    send_timestamps: true
+    metric_expiration: 192h    # 8 days, covers 7d window
+    enable_open_metrics: true
 
-Still useful. The spike itself is informative ("that burn just hit
-8k tokens in 40 seconds"). STALL lamp covers the "is it still
-working or is the session idle" ambiguity.
-
-If true stream-time is wanted later, a Claude Code hook
-(`SessionStart` / `Stop` / `PostToolUse`) can push heartbeats into
-the daemon. Dramatically more work for modest gain. Park as v2.
-
-## Calibration
-
-The daemon does not know Anthropic's real per-plan caps. User
-configures a local ceiling:
-
-```
-CLAUDE_GAUGE_5H_CEILING=<tokens>
-CLAUDE_GAUGE_7D_CEILING=<tokens>
-CLAUDE_GAUGE_TACH_REDLINE=<tokens/min>
+service:
+  pipelines:
+    metrics:
+      receivers: [otlp]
+      processors: [memory_limiter, batch]
+      exporters: [prometheus]
 ```
 
-Gauges read against those ceilings. Estimate from experience: run
-for a week at normal usage, note where `/usage` says you are vs
-where the dials read, adjust the config. Ship sensible defaults
-tunable per user.
+```yaml
+# prometheus.yml
+global:
+  scrape_interval: 15s
+  evaluation_interval: 15s
 
-## Data source migration path
+scrape_configs:
+  - job_name: 'otel-collector'
+    static_configs:
+      - targets: ['otel-collector:8889']
+```
 
-When Anthropic ships an official usage endpoint, the daemon replaces
-its input with that endpoint and drops the JSONL tailing. The HTTP
-shape it serves to the cluster stays the same. The dashboard gains
-accuracy but does not change structure. Zero hardware change, zero
-firmware change.
+Import Grafana Labs dashboard **25052** ("Claude Code") against the
+Prometheus data source. That is the deep-stats dashboard; no custom
+web UI needed.
 
-## Stack (proposed, not decided)
+## Claude Code configuration
 
-**Daemon**: Python 3.12+, `uv`, `watchdog` for file tail, FastAPI
-for HTTP, SQLite for durable state, Jinja2 for dashboard templates,
-Alpine.js or HTMX for dashboard interactivity. Matches the house
-style (see quartermaster).
+Set in the shell Claude Code runs in (user profile, systemd unit,
+or `~/.claude/settings.json` managed settings):
 
-**Firmware**: MicroPython on ESP32 or Pi Pico W for the wifi stack.
-C/Rust if MicroPython HTTP polling turns out wobbly at the required
-update rate.
+```bash
+export CLAUDE_CODE_ENABLE_TELEMETRY=1
+export OTEL_METRICS_EXPORTER=otlp
+export OTEL_LOGS_EXPORTER=otlp
+export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
+export OTEL_METRIC_EXPORT_INTERVAL=10000   # 10s for gauge responsiveness
+export OTEL_METRICS_INCLUDE_SESSION_ID=false   # bound cardinality
+```
 
-**Hardware**: three analog voltmeter movements (0-5V or similar)
-for the gauges, driven from PWM pins through low-pass filters, or
-tiny servos with printed needles. Annunciator row is discrete LEDs
-behind a smoked acrylic face.
+## Metrics Claude Code emits (via OTEL, surfaced in Prometheus)
 
-## Metrics brainstorm (for later inspection)
+All prefixed `claude_code_` after the OTEL-to-Prom conversion.
 
-Everything on this list is parseable from the JSONL today. The MVP
-daemon only needs the primary-window stats; everything else lands
-under later issues. Capturing them here so they don't get forgotten.
+| Prometheus metric | Labels |
+|---|---|
+| `claude_code_token_usage_tokens_total` | `type` (`input`/`output`/`cacheRead`/`cacheCreation`), `model` |
+| `claude_code_cost_usage_USD_total` | `model` |
+| `claude_code_session_count_total` | |
+| `claude_code_active_time_total_seconds_total` | `type` (`user`/`cli`) |
+| `claude_code_lines_of_code_count_total` | `type` (`added`/`removed`) |
+| `claude_code_commit_count_total` | |
+| `claude_code_pull_request_count_total` | |
+| `claude_code_code_edit_tool_decision_count_total` | `tool_name`, `decision`, `language` |
+
+Events (via OTEL logs) carry richer per-request context including
+`prompt.id`, `duration_ms`, `speed` (fast/normal), etc. Not needed
+for the primary gauges.
+
+## PromQL the daemon runs
+
+```promql
+# Tokens/min, short rolling window (tach)
+sum(rate(claude_code_token_usage_tokens_total[1m])) * 60
+
+# 5h window sum (left fuel)
+sum(increase(claude_code_token_usage_tokens_total[5h]))
+
+# 7d window sum (right fuel)
+sum(increase(claude_code_token_usage_tokens_total[7d]))
+
+# Cache hit rate (optional sub-gauge)
+  sum(rate(claude_code_token_usage_tokens_total{type="cacheRead"}[5m]))
+/ sum(rate(claude_code_token_usage_tokens_total{type=~"input|cacheRead|cacheCreation"}[5m]))
+
+# Last model (approximation via max-sample lookup)
+topk(1, claude_code_token_usage_tokens_total{type="output"})
+
+# Cost estimates
+sum(increase(claude_code_cost_usage_USD_total[5h]))
+sum(increase(claude_code_cost_usage_USD_total[7d]))
+
+# Stall detection (no tokens in last N minutes)
+absent(rate(claude_code_token_usage_tokens_total[2m]) > 0)
+```
+
+## Daemon (A)
+
+Thin Python service. Queries Prometheus, transforms to `/usage`
+payload for the firmware.
+
+```
+src/claude_gauge/
+  __init__.py
+  daemon_prom.py       FastAPI app, PromQL queries, /usage endpoint
+  config.py            Prometheus URL, ceilings, stall threshold
+  windows.py           PromQL builders and result parsing
+  calibration.py       Maps raw values to firmware-friendly 0-1000 scales
+```
+
+```python
+# daemon_prom.py sketch
+import os
+import httpx
+from fastapi import FastAPI
+
+PROM = os.environ.get("CLAUDE_GAUGE_PROM_URL", "http://localhost:9090")
+CEIL_5H = int(os.environ.get("CLAUDE_GAUGE_5H_CEILING", 500_000))
+CEIL_7D = int(os.environ.get("CLAUDE_GAUGE_7D_CEILING", 3_000_000))
+RED = int(os.environ.get("CLAUDE_GAUGE_TACH_REDLINE", 8000))  # tokens/min
+
+app = FastAPI()
+client = httpx.AsyncClient(timeout=5.0)
+
+async def prom(q: str) -> float:
+    r = await client.get(f"{PROM}/api/v1/query", params={"query": q})
+    data = r.json()["data"]["result"]
+    return float(data[0]["value"][1]) if data else 0.0
+
+@app.get("/usage")
+async def usage():
+    rate_1m = await prom("sum(rate(claude_code_token_usage_tokens_total[1m])) * 60")
+    win_5h  = await prom("sum(increase(claude_code_token_usage_tokens_total[5h]))")
+    win_7d  = await prom("sum(increase(claude_code_token_usage_tokens_total[7d]))")
+    cache   = await prom(
+        'sum(rate(claude_code_token_usage_tokens_total{type="cacheRead"}[5m])) / '
+        'sum(rate(claude_code_token_usage_tokens_total{type=~"input|cacheRead|cacheCreation"}[5m]))'
+    )
+    stalled = (await prom(
+        'sum(rate(claude_code_token_usage_tokens_total[2m]))'
+    )) == 0.0
+    return {
+        "rate_1m": rate_1m,
+        "window_5h_tokens": win_5h,
+        "window_5h_pct": min(1.0, win_5h / CEIL_5H),
+        "window_7d_tokens": win_7d,
+        "window_7d_pct": min(1.0, win_7d / CEIL_7D),
+        "cache_hit_rate": cache,
+        "hot": rate_1m > RED,
+        "warn": (win_5h / CEIL_5H) > 0.8 or (win_7d / CEIL_7D) > 0.8,
+        "stall": stalled,
+        "idle": True,
+        "last_model": await last_model(),
+    }
+```
+
+`last_model` needs one extra query that picks the `model` label of
+the most recently incremented output-token series. Implementation
+detail; simplest is to run a small query loop on metric labels.
+
+## Dependencies (A)
+
+```toml
+# pyproject.toml additions
+dependencies = [
+    "fastapi>=0.136.0",
+    "uvicorn[standard]>=0.44.0",
+    "httpx>=0.28.1",
+]
+```
+
+No SQLite, no watchdog, no ORM. Prometheus is the database.
+
+## Retention considerations
+
+* Collector `metric_expiration: 192h` keeps a metric visible for 8d
+  after its last sample, so 7d `increase()` queries work even on
+  intermittent sessions.
+* Prometheus `--storage.tsdb.retention.time=8d` keeps the samples
+  long enough for the same 7d queries.
+* Grafana dashboard 25052 pulls from the same Prometheus.
+
+## Pros and cons of A
+
+Pros:
+* Uses the platform feature Anthropic ships.
+* Grafana dashboard is free.
+* Metric schema is documented and stable.
+* Plays cleanly with any other homelab metrics already in Prometheus.
+* Architecture translates without changes when other machines run
+  Claude Code too: point their OTLP endpoint at the same collector.
+
+Cons:
+* Prometheus scrape interval caps tach responsiveness at ~15s.
+* Three containers to run.
+* Requires env-var changes on every Claude Code launch surface.
+
+## Tach responsiveness mitigation (A)
+
+If the 15s cap bothers you, the daemon can keep a tiny JSONL-tail
+fallback just for the tach. Same code shape as architecture B's tach
+component; described below. Pulling the fuel gauges and everything
+else from Prometheus, tach from direct file tail, is a clean hybrid.
+Only activate if Phase C shows the needle feels sluggish.
+
+---
+
+# Architecture B: ccusage-sourced
+
+## Stack
+
+One process: `ccusage` as a long-lived subprocess or periodic shell
+call. No collector, no Prometheus, no Grafana. A hybrid watchdog
+tail handles the sub-second tach that ccusage's aggregate API can't.
+
+```
+[ Claude Code ]  ->  ~/.claude/projects/**/*.jsonl
+                            |
+                            +---+----------------+
+                            |                    |
+                            v                    v
+                [ watchdog tail ]       [ ccusage CLI / MCP ]
+                (short-window tach)     (5h blocks, 7d daily)
+                            |                    |
+                            +----------+---------+
+                                       v
+                            [ claude-gauge daemon ]
+                                 GET /usage
+                                       |
+                                       v
+                               ESP32 firmware
+```
+
+## ccusage integration options
+
+Two shapes work. Pick one, not both.
+
+### Option B1: periodic CLI subprocess (simplest)
+
+```bash
+npx ccusage@latest blocks --json        # current 5h block
+npx ccusage@latest daily  --json        # per-day aggregates for 7d sum
+```
+
+Run every ~10s from the daemon. Parse JSON, fill the fuel gauges.
+
+### Option B2: ccusage MCP HTTP server (persistent)
+
+```bash
+bunx @ccusage/mcp@latest --type http --port 8080
+```
+
+Exposes a Hono app at `POST /` handling MCP StreamableHTTP
+requests. Four registered tools:
+
+| Tool | Description |
+|---|---|
+| `daily` | Usage grouped by date |
+| `monthly` | Usage grouped by month |
+| `session` | Usage grouped by conversation session |
+| `blocks` | Usage grouped by 5-hour session billing blocks |
+
+Each tool accepts `since`, `until`, `mode`, `timezone`, `locale` and
+returns JSON in an MCP text content block.
+
+Invoke as an MCP client from the daemon (`mcp` Python SDK) or as
+raw JSON-RPC to `POST /`.
+
+### Recommendation
+
+**B1**. The CLI path is simpler, has fewer moving parts, and the
+performance hit of a subprocess call every 10s is negligible.
+Switch to B2 only if you also want the MCP surface exposed to other
+local agents (Claude Code can already consume ccusage's MCP).
+
+## Short-window tach via watchdog
+
+ccusage aggregates are too coarse for the tach. The daemon keeps its
+own 60-second ring buffer by tailing JSONL directly.
+
+```python
+from watchdog.observers import Observer
+from watchdog.events import FileSystemEventHandler
+from collections import deque
+from pathlib import Path
+import json, time
+
+class JsonlTail(FileSystemEventHandler):
+    def __init__(self, bus):
+        self.bus = bus
+        self.offsets: dict[Path, int] = {}
+
+    def on_modified(self, event):
+        p = Path(event.src_path)
+        if p.suffix != ".jsonl":
+            return
+        off = self.offsets.get(p, 0)
+        with p.open() as f:
+            f.seek(off)
+            for line in f:
+                try:
+                    d = json.loads(line)
+                except json.JSONDecodeError:
+                    continue
+                if d.get("type") == "assistant":
+                    u = d.get("message", {}).get("usage", {})
+                    tokens = sum(u.get(k, 0) for k in (
+                        "input_tokens", "output_tokens",
+                        "cache_read_input_tokens",
+                        "cache_creation_input_tokens",
+                    ))
+                    model = d.get("message", {}).get("model", "")
+                    self.bus.push(time.time(), tokens, model)
+            self.offsets[p] = f.tell()
+
+class RateBus:
+    def __init__(self, window_s=60):
+        self.window_s = window_s
+        self.buf: deque[tuple[float, int, str]] = deque()
+
+    def push(self, ts, tokens, model):
+        self.buf.append((ts, tokens, model))
+        self._evict()
+
+    def _evict(self):
+        cutoff = time.time() - self.window_s
+        while self.buf and self.buf[0][0] < cutoff:
+            self.buf.popleft()
+
+    def rate_per_min(self):
+        self._evict()
+        return sum(t for _, t, _ in self.buf)
+
+    def last_model(self):
+        return self.buf[-1][2] if self.buf else None
+```
+
+## Daemon (B)
+
+```
+src/claude_gauge/
+  __init__.py
+  daemon_ccusage.py    FastAPI app, ccusage subprocess calls, /usage
+  tail.py              watchdog + RateBus for tach
+  config.py
+  calibration.py
+```
+
+```python
+# daemon_ccusage.py sketch
+import asyncio, json, os, subprocess
+from fastapi import FastAPI
+from .tail import RateBus, start_watcher
+
+CEIL_5H = int(os.environ.get("CLAUDE_GAUGE_5H_CEILING", 500_000))
+CEIL_7D = int(os.environ.get("CLAUDE_GAUGE_7D_CEILING", 3_000_000))
+RED = int(os.environ.get("CLAUDE_GAUGE_TACH_REDLINE", 8000))
+
+bus = RateBus(window_s=60)
+start_watcher(bus)   # background thread
+
+app = FastAPI()
+
+async def ccusage(cmd: str) -> dict:
+    proc = await asyncio.create_subprocess_exec(
+        "npx", "ccusage@latest", cmd, "--json",
+        stdout=asyncio.subprocess.PIPE,
+    )
+    out, _ = await proc.communicate()
+    return json.loads(out)
+
+async def current_5h_tokens() -> int:
+    blocks = await ccusage("blocks")
+    cur = next((b for b in blocks.get("blocks", []) if b.get("isActive")), None)
+    return cur["totalTokens"] if cur else 0
+
+async def trailing_7d_tokens() -> int:
+    daily = await ccusage("daily")
+    # sum last 7 daily buckets
+    rows = daily.get("daily", [])[-7:]
+    return sum(r["totalTokens"] for r in rows)
+
+@app.get("/usage")
+async def usage():
+    rate = bus.rate_per_min()
+    w5, w7 = await asyncio.gather(current_5h_tokens(), trailing_7d_tokens())
+    return {
+        "rate_1m": rate,
+        "window_5h_tokens": w5,
+        "window_5h_pct": min(1.0, w5 / CEIL_5H),
+        "window_7d_tokens": w7,
+        "window_7d_pct": min(1.0, w7 / CEIL_7D),
+        "hot": rate > RED,
+        "warn": (w5 / CEIL_5H) > 0.8 or (w7 / CEIL_7D) > 0.8,
+        "stall": rate == 0 and not bus.buf,
+        "idle": True,
+        "last_model": bus.last_model(),
+    }
+```
+
+Cache `ccusage blocks/daily` output with a 10s TTL so the `/usage`
+endpoint stays cheap when the firmware polls at 1 Hz.
+
+## Dependencies (B)
+
+```toml
+dependencies = [
+    "fastapi>=0.136.0",
+    "uvicorn[standard]>=0.44.0",
+    "watchdog>=5.0.0",
+]
+```
+
+Node needs to be on the PATH for `npx ccusage@latest`. Pin a version
+in config rather than using `@latest` once the daemon is past Phase A.
+
+## Pros and cons of B
+
+Pros:
+* Single process, one dependency tree.
+* Sub-second tach works out of the box via the watchdog tail.
+* No service stack, no Docker, no collector.
+* ccusage is actively maintained and has already solved the edge
+  cases in JSONL parsing (missing fields, renamed formats, cache
+  token math, cost per model).
+
+Cons:
+* No free Grafana dashboard. If you want deep stats, either run
+  `ccusage` interactively or build something.
+* Node on the runtime path.
+* JSONL format is an implementation detail; upstream changes could
+  break parsing. ccusage tracks these but there's a lag window.
+* Does not generalise if other machines also run Claude Code; each
+  one needs its own daemon.
+
+---
+
+# Hardware (shared by A and B)
+
+## Movement
+
+**Switec X27.168** automotive stepper motor. 315-degree sweep, 600
+steps, roughly 2 degrees / step. ~$8 each. Used in car dashboards,
+so enclosures and bezels exist off the shelf.
+
+Related cousins: X25, VID28, VID29, BKA30D-R5. The library supports
+all of them, but X27.168 has the longest sweep and the most
+available tutorials.
+
+## Driver
+
+`SwitecX25` Arduino library (`clearwater/SwitecX25` on GitHub). Works
+for X27.168 despite the name. Drives 4 GPIO pins per motor. No
+external driver IC required for short wiring runs; use small
+transistor arrays (ULN2003A) if you want cleaner current handling.
+
+No maintained MicroPython port exists. **Firmware is Arduino C++**
+rather than MicroPython. Not the original plan, but the right trade.
+
+## Board
+
+**ESP32 DevKit** (generic). WiFi, enough GPIO for 3 steppers (12
+pins) plus 8 annunciator LEDs and a reset button. ~$8.
+
+Alternative: Raspberry Pi Pico W. Less toolchain overhead if you
+prefer CircuitPython, but you'd still be hand-rolling the stepper
+driver.
+
+## Wiring sketch
+
+```
+ESP32 DevKit
+  GPIO 13,14,27,26  -->  X27.168 #1 (left fuel)
+  GPIO 25,33,32,35  -->  X27.168 #2 (tach)
+  GPIO 34,39,36,22  -->  X27.168 #3 (right fuel)
+  GPIO 21  -->  OPUS LED   (red)
+  GPIO 19  -->  SONNET LED (amber)
+  GPIO 18  -->  HAIKU LED  (green)
+  GPIO 5   -->  HOT LED    (red, PWM for flashing)
+  GPIO 17  -->  WARN LED   (amber)
+  GPIO 16  -->  STALL LED  (blue)
+  GPIO 4   -->  IDLE LED   (green, pulses while daemon reachable)
+  GPIO 15  -->  tactile reset button (pull-up)
+```
+
+220R resistors per LED. Use a separate 5V rail for the steppers if
+you see brownouts when all three move at once; ESP32's 3V3 rail is
+fine for signals but the motors pull more than the onboard regulator
+likes.
+
+## Firmware structure
+
+```
+firmware/
+  platformio.ini
+  src/
+    main.cpp            setup() + loop()
+    wifi.cpp            connect + reconnect
+    gauge.cpp           wraps SwitecX25; map pct 0..1 to 0..steps
+    annunciator.cpp     LED state machine
+    poll.cpp            HTTP GET /usage every 1s
+    config.h            daemon URL, redline, thresholds
+```
+
+Poll loop:
+
+1. Every 1000ms, GET `http://<daemon>:8080/usage`.
+2. Parse JSON (ArduinoJson).
+3. Set gauge targets: `tach.setTargetStep(map(rate_1m, 0, redline, 0, 600))`, likewise for fuels.
+4. Update LED states from `hot/warn/stall/idle/last_model`.
+5. `gauge.update()` runs the stepper every loop tick until it hits target.
+
+## Enclosure
+
+* Cream faces, hairline burgundy redline zone (matches quartermaster
+  palette if you want the house look).
+* Brushed aluminium bezel; 3D-print + spray-paint is fine for V1.
+* Annunciator row behind smoked acrylic so the LEDs only show when
+  lit.
+* Desk-size footprint: roughly 180mm wide x 90mm tall for the cluster.
+
+---
+
+# Phasing
+
+One phase per issue. No scope bleed between phases.
+
+| Phase | Deliverable | Architecture-agnostic? |
+|---|---|---|
+| A | Daemon prints five window values to stdout | No (A or B chosen before start) |
+| B | `/usage` HTTP endpoint; curl from browser or another box | No |
+| C | ESP32 firmware driving ONE needle (tach) from the daemon | Yes |
+| D | Three needles plus annunciator row | Yes |
+| E | Calibration period: tune ceilings and redline against real use | Yes |
+| F | Enclosure V1 (printed), cabling, permanent install | Yes |
+| G | (If A) Grafana dashboard wired in; (if B) pick a deep-stats path or decline | Diverges |
+| H | Character metrics and cross-system correlations (em-dash counter, git correlation, quartermaster correlation) | Yes |
+
+Do not attempt Phase D before Phase C. Hardware integration is
+where surprises land; start with one axis.
+
+---
+
+# Recommendation (for Jeff's homelab)
+
+Architecture A.
+
+The homelab-as-enterprise framing is the deciding factor. OTEL is
+the platform feature, Prometheus is already the right tool, Grafana
+dashboard 25052 is a free deep-stats surface, and the architecture
+generalises if other machines start running Claude Code. The 15s
+scrape interval is the only real concession; if the tach feels
+sluggish after Phase E, bolt the JSONL tail from B on top for the
+tach path only. Hybrid.
+
+If you don't already run Prometheus in the homelab, B gets you to
+a working needle sooner (Phase A ships same day). Migrate to A
+later if OTEL becomes useful for other things.
+
+Either way, the firmware and cluster are identical. The architecture
+choice is only about what the daemon reads.
+
+# Metrics brainstorm (for later phases)
+
+All derivable from OTEL (A) or from the JSONL directly (B). Not
+wired into the primary cluster; land in Phase G or a future Grafana
+panel.
 
 ### Cost and tokens
-
-* Total tokens by window (already primary)
-* Breakdown: `input` / `output` / `cache_read` / `cache_creation`
-* **Cache hit rate** = cache_read / (cache_read + cache_creation + input)
-* **Cache savings** in dollars (cache reads are 10% of normal input cost)
-* Cost per session at published per-model pricing
-* Cost per day / week / month; projected month
-* Opus / Sonnet / Haiku token split
-* Server tool use: `web_search_requests`, `web_fetch_requests` per day
-* Service tier distribution (standard vs priority)
-* Ephemeral-1h vs ephemeral-5m cache split
+* Cache hit rate and cache-savings dollar value.
+* Cost per session at published pricing.
+* Projected monthly spend.
+* Opus / Sonnet / Haiku token split.
+* Server tool use (web search / web fetch) counts.
 
 ### Time and rhythm
-
-* Sessions per day / per week, rolling average
-* Session duration distribution
-* Time-of-day heatmap (circadian work pattern)
-* Day-of-week heatmap
-* Longest continuous session on record
-* Your think-time: gap between assistant-end and next user message
-* Claude's work-time: user message to assistant complete
-* Active vs idle ratio within sessions
-* Streak tracking (consecutive days used)
-* All-nighter detector (session crossing 2am local)
+* Session count, duration distribution, time-of-day heatmap.
+* Think-time (user idle) vs work-time (assistant active).
+* Streak tracking; all-nighter detector.
 
 ### Work shape
-
-* **Thinking tokens** counted from content blocks of type `thinking`
-* Thinking-to-output ratio ("cogitation index")
-* Stop-reason distribution (`end_turn` / `tool_use` / `max_tokens`);
-  watch for rising `max_tokens` (responses getting cut off)
-* Messages per session
-* Tool calls per assistant response (parallelism indicator)
-* User interrupt rate (sessions ending on cancel)
-* Iteration count per task (assistant messages between two
-  consecutive user prompts as a proxy)
+* Thinking-to-output ratio as a "cogitation index" gauge.
+* Stop-reason distribution (watch rising `max_tokens`).
+* Tool calls per assistant response (parallelism indicator).
 
 ### Tool usage
-
-* Top tools by count (Bash, Edit, Read, Grep, ...)
-* Tool success vs failure rate
-* Bash command distribution, parsed by root executable (git, python,
-  uv, ls, ...)
-* File reads / edits / writes per session
-* Hottest files by touch count across all sessions
-* Agent / subagent counts (`isSidechain=true`), subagent depth
-* Web search / web fetch counts
+* Top tools by count. Bash-command root-executable distribution.
+* File reads vs edits vs writes per session.
+* Hottest files across all sessions.
+* Agent / subagent counts (`isSidechain=true`).
 
 ### Project and context
+* Tokens per project, last-active timestamp, dormant-project detector.
 
-* Tokens per project (JSONL path encodes the project)
-* Time per project
-* Project switching rate within a session
-* Dormant project detector (no activity in N days)
-* Languages touched, derived from file extensions
-* Last file edited per project (resume-where-you-left-off)
+### Friction and quality
+* Permission denial frequency.
+* File-history-snapshot count per session.
 
-### Friction and quality signals
+### Character
+* **Em-dash violation counter** against the CLAUDE.md rule.
+* Most-used phrase by Claude vs by the user.
+* Thank-you rate, "Dude, chill" detector.
 
-* User message length distribution (one-word directives vs prose)
-* Rough correction reflex count: user messages starting with "no",
-  "wrong", "stop", "actually"
-* Permission denial frequency
-* Retry / regenerate patterns
-* File-history-snapshot count per session (checkpoint density)
+### Cross-system
+* Git correlation: commits produced, lines changed per token.
+* Quartermaster correlation: budget-editing days vs Claude load.
 
-### Character and fun
+---
 
-* Em dash violation count in assistant text (per the CLAUDE.md rule,
-  a needle that reads "rule-break events this week")
-* Emoji leakage count
-* Most-used phrase by Claude in your transcripts
-* Most-used phrase by you (captures your actual directive vocabulary)
-* "Dude, chill" detector: count of explicit pushback against Claude
-* Thank-you rate per session
-* Silent sessions (ended without `/compact` or `/clear`)
+# Next steps
 
-### Cross-system correlations (bigger, later)
-
-* Git: commits produced per session, lines changed per token spent
-* PRs / issues closed per session
-* Quartermaster: correlate long Claude sessions with budget-editing
-  days (just for fun)
-
-## Phasing
-
-### Phase A — daemon MVP
-
-One issue. Tail JSONL, parse into structured events, maintain the
-five primary windows, print them to stdout every second. No HTTP,
-no hardware, no dashboard. Prove the numbers land correctly against
-Claude Code's own `/usage` output.
-
-### Phase B — HTTP + dashboard overview
-
-Stand up FastAPI, expose `GET /usage` for the cluster and `GET /`
-for a dashboard overview page (three gauges rendered in SVG, same
-data as the cluster). No deep stats yet. First thing you can load
-in a browser.
-
-### Phase C — firmware + single needle
-
-Minimal ESP32 / Pico firmware that polls `/usage` and drives one
-needle (the tach). Prove the hardware path end to end.
-
-### Phase D — full cluster
-
-Two more needles + annunciator row. Enclosure prototype.
-
-### Phase E — dashboard deep stats
-
-Cost panel, model panel, project panel, rhythm heatmaps, tool
-panel, file panel, raw-event tail. Pulls from the aggregate store
-the daemon has been building since Phase A.
-
-### Phase F — character and cross-system
-
-Em-dash detector, phrase extractor, git correlation, quartermaster
-correlation. Lowest priority, highest amusement.
-
-## Next steps
-
-1. Forgejo repo at `archeious/claude-gauge` (created alongside
-   this plan).
-2. File Phase A as the first issue.
-3. Ship Phase A. See the five numbers tick up in a terminal while
-   typing into Claude Code. First dopamine hit.
-4. File follow-up issues one phase at a time. No scope bleed
-   between phases.
+1. Decide A or B. Default: A.
+2. File Phase A as the first issue on `archeious/claude-gauge`.
+3. If A: stand up the Compose stack, point Claude Code at it,
+   verify metrics reach Prometheus via the `/api/v1/query` browser
+   interface.
+4. If B: install `ccusage`, run `blocks --json` and `daily --json`
+   by hand, paste the outputs somewhere durable for reference.
+5. Ship Phase A. See the numbers tick in a terminal.
diff --git a/README.md b/README.md
index ef8ebed..bcbf198 100644
--- a/README.md
+++ b/README.md
@@ -1,19 +1,31 @@
 # claude-gauge
 
-Hardware instrument cluster plus companion web dashboard for Claude
-Code session telemetry.
+Hardware instrument cluster displaying Claude Code session telemetry.
 
-A local Python daemon tails `~/.claude/projects/**/*.jsonl`, parses
-structured events into rolling windows and aggregate stats, and
-exposes them over HTTP. An ESP32 or Pi Pico drives a three-gauge
-analog cluster on the desk (tokens/min tach, 5h fuel, 7d fuel) with
-an annunciator row of model and warning lamps. A browser dashboard
-on the same port surfaces the deep stats.
+Three analog needle gauges (5h fuel, tokens/min tach, 7d fuel) plus
+an annunciator row of model-indicator and warning lamps, driven by
+an ESP32 polling a local Python daemon. The daemon reads either
+Claude Code's native OpenTelemetry feed through a Prometheus stack
+(architecture A) or `ccusage` CLI aggregates with a direct JSONL
+tail for the tach (architecture B). Firmware and cluster are
+identical across both.
 
-See [PLAN.md](PLAN.md) for architecture, the instrument cluster
-layout, the dashboard sections, the full metrics brainstorm, and
-the phasing plan.
+Fighter-jet / race-car aesthetic. Physical-first: all the deep
+stats live in Grafana (A) or ccusage's own surfaces (B). The dial
+on the desk is the ambient summary.
+
+See [PLAN.md](PLAN.md) for:
+
+* Instrument cluster layout and annunciator semantics
+* Architecture A: Docker Compose stack, Claude Code env config,
+  PromQL queries, daemon sketch
+* Architecture B: ccusage subprocess integration, watchdog tail,
+  daemon sketch
+* Hardware: X27.168 steppers, SwitecX25 library, ESP32 wiring,
+  enclosure notes
+* Six-phase plan from daemon MVP through enclosure V1
+* Full metrics brainstorm for later phases
 
 ## Status
 
-Scaffolded. Phase A (daemon MVP) is the first issue.
+Scaffolded. Phase A pending architecture decision and first issue.