claude-gauge/PLAN.md
Jeff Smith 08eaa63300 Flesh out PLAN.md with two-architecture implementation detail
Expand the planning document to implementation-ready detail. Both
viable data paths are specified independently so either can ship:

- Architecture A (OTEL-native): full docker-compose stack mirroring
  Anthropic's claude-code-monitoring-guide (otel-collector on 4317
  / 4318 / 8889, Prometheus on 9090, Grafana on 3000), Claude Code
  env vars, PromQL queries for each gauge, Python daemon sketch
  that queries Prometheus and serves /usage.

- Architecture B (ccusage-sourced): subprocess invocation of
  ccusage CLI for 5h blocks and 7d daily aggregates, watchdog
  JSONL tail for sub-second tach responsiveness, Python daemon
  sketch with a 60s RateBus ring buffer.

Hardware specified: X27.168 stepper motors driven by the Arduino
SwitecX25 library on ESP32 (Arduino C++ since no MicroPython port
exists), concrete GPIO pin assignments, ULN2003A driver notes,
annunciator LED wiring, enclosure notes.

Also captured: metric schema from Claude Code's OTEL docs,
prior-art review (ccusage / Claude-Code-Usage-Monitor / Grafana
Labs dashboards 25052 and 24993 / Anthropic's monitoring guide),
six-phase delivery plan, comparison table of A vs B, recommendation
(A for homelab-as-enterprise framing), and a metrics brainstorm
for later phases.

README updated to summarise both architectures and point at the
expanded plan.

Refs #1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 19:18:59 -06:00

758 lines
24 KiB
Markdown

# claude-gauge
Hardware instrument cluster displaying Claude Code session telemetry.
Three analog needle gauges plus an annunciator row, driven by an ESP32
polling a local daemon, driven by Claude Code's own OpenTelemetry feed
or by `ccusage`. Fighter-jet / race-car aesthetic. Physical-first.
## Why
Watching tokens burn against the Max-plan windows is useful, but the
same data also tells you when Claude is grinding, which model just
ran, and how warm your cache is. A dial on the desk makes that
ambient instead of tab-switching.
## Prior art and the decision it implies
Software side is crowded. `ccusage`, Claude-Code-Usage-Monitor,
haasonsaas/claude-usage-tracker, phuryn/claude-usage, multiple
Grafana dashboards (Grafana Labs 25052 and 24993), and Anthropic's
own `claude-code-monitoring-guide` repo all do the JSONL parsing and
rolling-window math already. Claude Code ships with native
OpenTelemetry support. The physical-gauge angle has no extant
prior art.
Implication: do not rebuild the telemetry layer. Consume it. Spend
the love on the hardware and the adapter that bridges it.
## Instrument cluster (same in all architectures)
```
+------------+ +--------------+ +------------+
| 5h FUEL | | TOKENS/MIN | | 7d FUEL |
| 0 - 100% | | 0 - redline | | 0 - 100% |
+------------+ +--------------+ +------------+
[OPUS] [SONNET] [HAIKU] [HOT] [WARN] [STALL] [IDLE]
```
| Gauge | Metric |
|---|---|
| Center tach | Tokens/min, rolling short window |
| Left fuel | % of 5h plan window used |
| Right fuel | % of 7d plan window used |
| Lamp | Condition |
|---|---|
| OPUS / SONNET / HAIKU | colour-coded model that emitted the most recent tokens |
| HOT | tach above redline |
| WARN | either fuel gauge above 80% |
| STALL | no telemetry in last N minutes |
| IDLE | daemon reachable, no activity |
## Two architectures
Pick one. Both feed the same firmware and cluster.
| | A. OTEL-native | B. ccusage-sourced |
|---|---|---|
| Data source | Claude Code OTLP -> collector -> Prometheus | Local JSONL via `ccusage` CLI |
| External deps | Docker Compose stack (collector, Prometheus, Grafana) | Node + `npx ccusage` |
| Deep-stats dashboard | Grafana dashboard 25052 for free | Build nothing, ccusage has a TUI |
| Short-window tach | Limited by Prometheus scrape interval (15s) | Hybrid JSONL tail gives sub-second |
| Operational weight | Moderate (3 services) | Tiny (one subprocess) |
| Homelab-enterprise fit | Strong | Weak |
| Time to first needle | Day 2 | Day 1 |
| Survivability through Claude Code updates | High (OTEL schema is stable and documented) | Medium (JSONL layout is an implementation detail) |
Both share the firmware, cluster, and enclosure. The daemon is the
only thing that differs. The `/usage` HTTP shape is identical across
A and B so the firmware never knows which backend is wired up.
---
# Architecture A: OTEL-native
## Stack
Mirror Anthropic's reference stack (`claude-code-monitoring-guide`).
Three containers.
```yaml
# docker-compose.yml
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "8889:8889" # Prometheus scrape
depends_on:
- prometheus
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=8d' # > 7d so increase() works
- '--web.enable-lifecycle'
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
- ./grafana/dashboards:/var/lib/grafana/dashboards
depends_on:
- prometheus
volumes:
prometheus_data:
grafana_data:
```
```yaml
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 512
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
send_timestamps: true
metric_expiration: 192h # 8 days, covers 7d window
enable_open_metrics: true
service:
pipelines:
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
```
```yaml
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'otel-collector'
static_configs:
- targets: ['otel-collector:8889']
```
Import Grafana Labs dashboard **25052** ("Claude Code") against the
Prometheus data source. That is the deep-stats dashboard; no custom
web UI needed.
## Claude Code configuration
Set in the shell Claude Code runs in (user profile, systemd unit,
or `~/.claude/settings.json` managed settings):
```bash
export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_METRIC_EXPORT_INTERVAL=10000 # 10s for gauge responsiveness
export OTEL_METRICS_INCLUDE_SESSION_ID=false # bound cardinality
```
## Metrics Claude Code emits (via OTEL, surfaced in Prometheus)
All prefixed `claude_code_` after the OTEL-to-Prom conversion.
| Prometheus metric | Labels |
|---|---|
| `claude_code_token_usage_tokens_total` | `type` (`input`/`output`/`cacheRead`/`cacheCreation`), `model` |
| `claude_code_cost_usage_USD_total` | `model` |
| `claude_code_session_count_total` | |
| `claude_code_active_time_total_seconds_total` | `type` (`user`/`cli`) |
| `claude_code_lines_of_code_count_total` | `type` (`added`/`removed`) |
| `claude_code_commit_count_total` | |
| `claude_code_pull_request_count_total` | |
| `claude_code_code_edit_tool_decision_count_total` | `tool_name`, `decision`, `language` |
Events (via OTEL logs) carry richer per-request context including
`prompt.id`, `duration_ms`, `speed` (fast/normal), etc. Not needed
for the primary gauges.
## PromQL the daemon runs
```promql
# Tokens/min, short rolling window (tach)
sum(rate(claude_code_token_usage_tokens_total[1m])) * 60
# 5h window sum (left fuel)
sum(increase(claude_code_token_usage_tokens_total[5h]))
# 7d window sum (right fuel)
sum(increase(claude_code_token_usage_tokens_total[7d]))
# Cache hit rate (optional sub-gauge)
sum(rate(claude_code_token_usage_tokens_total{type="cacheRead"}[5m]))
/ sum(rate(claude_code_token_usage_tokens_total{type=~"input|cacheRead|cacheCreation"}[5m]))
# Last model (approximation via max-sample lookup)
topk(1, claude_code_token_usage_tokens_total{type="output"})
# Cost estimates
sum(increase(claude_code_cost_usage_USD_total[5h]))
sum(increase(claude_code_cost_usage_USD_total[7d]))
# Stall detection (no tokens in last N minutes)
absent(rate(claude_code_token_usage_tokens_total[2m]) > 0)
```
## Daemon (A)
Thin Python service. Queries Prometheus, transforms to `/usage`
payload for the firmware.
```
src/claude_gauge/
__init__.py
daemon_prom.py FastAPI app, PromQL queries, /usage endpoint
config.py Prometheus URL, ceilings, stall threshold
windows.py PromQL builders and result parsing
calibration.py Maps raw values to firmware-friendly 0-1000 scales
```
```python
# daemon_prom.py sketch
import os
import httpx
from fastapi import FastAPI
PROM = os.environ.get("CLAUDE_GAUGE_PROM_URL", "http://localhost:9090")
CEIL_5H = int(os.environ.get("CLAUDE_GAUGE_5H_CEILING", 500_000))
CEIL_7D = int(os.environ.get("CLAUDE_GAUGE_7D_CEILING", 3_000_000))
RED = int(os.environ.get("CLAUDE_GAUGE_TACH_REDLINE", 8000)) # tokens/min
app = FastAPI()
client = httpx.AsyncClient(timeout=5.0)
async def prom(q: str) -> float:
r = await client.get(f"{PROM}/api/v1/query", params={"query": q})
data = r.json()["data"]["result"]
return float(data[0]["value"][1]) if data else 0.0
@app.get("/usage")
async def usage():
rate_1m = await prom("sum(rate(claude_code_token_usage_tokens_total[1m])) * 60")
win_5h = await prom("sum(increase(claude_code_token_usage_tokens_total[5h]))")
win_7d = await prom("sum(increase(claude_code_token_usage_tokens_total[7d]))")
cache = await prom(
'sum(rate(claude_code_token_usage_tokens_total{type="cacheRead"}[5m])) / '
'sum(rate(claude_code_token_usage_tokens_total{type=~"input|cacheRead|cacheCreation"}[5m]))'
)
stalled = (await prom(
'sum(rate(claude_code_token_usage_tokens_total[2m]))'
)) == 0.0
return {
"rate_1m": rate_1m,
"window_5h_tokens": win_5h,
"window_5h_pct": min(1.0, win_5h / CEIL_5H),
"window_7d_tokens": win_7d,
"window_7d_pct": min(1.0, win_7d / CEIL_7D),
"cache_hit_rate": cache,
"hot": rate_1m > RED,
"warn": (win_5h / CEIL_5H) > 0.8 or (win_7d / CEIL_7D) > 0.8,
"stall": stalled,
"idle": True,
"last_model": await last_model(),
}
```
`last_model` needs one extra query that picks the `model` label of
the most recently incremented output-token series. Implementation
detail; simplest is to run a small query loop on metric labels.
## Dependencies (A)
```toml
# pyproject.toml additions
dependencies = [
"fastapi>=0.136.0",
"uvicorn[standard]>=0.44.0",
"httpx>=0.28.1",
]
```
No SQLite, no watchdog, no ORM. Prometheus is the database.
## Retention considerations
* Collector `metric_expiration: 192h` keeps a metric visible for 8d
after its last sample, so 7d `increase()` queries work even on
intermittent sessions.
* Prometheus `--storage.tsdb.retention.time=8d` keeps the samples
long enough for the same 7d queries.
* Grafana dashboard 25052 pulls from the same Prometheus.
## Pros and cons of A
Pros:
* Uses the platform feature Anthropic ships.
* Grafana dashboard is free.
* Metric schema is documented and stable.
* Plays cleanly with any other homelab metrics already in Prometheus.
* Architecture translates without changes when other machines run
Claude Code too: point their OTLP endpoint at the same collector.
Cons:
* Prometheus scrape interval caps tach responsiveness at ~15s.
* Three containers to run.
* Requires env-var changes on every Claude Code launch surface.
## Tach responsiveness mitigation (A)
If the 15s cap bothers you, the daemon can keep a tiny JSONL-tail
fallback just for the tach. Same code shape as architecture B's tach
component; described below. Pulling the fuel gauges and everything
else from Prometheus, tach from direct file tail, is a clean hybrid.
Only activate if Phase C shows the needle feels sluggish.
---
# Architecture B: ccusage-sourced
## Stack
One process: `ccusage` as a long-lived subprocess or periodic shell
call. No collector, no Prometheus, no Grafana. A hybrid watchdog
tail handles the sub-second tach that ccusage's aggregate API can't.
```
[ Claude Code ] -> ~/.claude/projects/**/*.jsonl
|
+---+----------------+
| |
v v
[ watchdog tail ] [ ccusage CLI / MCP ]
(short-window tach) (5h blocks, 7d daily)
| |
+----------+---------+
v
[ claude-gauge daemon ]
GET /usage
|
v
ESP32 firmware
```
## ccusage integration options
Two shapes work. Pick one, not both.
### Option B1: periodic CLI subprocess (simplest)
```bash
npx ccusage@latest blocks --json # current 5h block
npx ccusage@latest daily --json # per-day aggregates for 7d sum
```
Run every ~10s from the daemon. Parse JSON, fill the fuel gauges.
### Option B2: ccusage MCP HTTP server (persistent)
```bash
bunx @ccusage/mcp@latest --type http --port 8080
```
Exposes a Hono app at `POST /` handling MCP StreamableHTTP
requests. Four registered tools:
| Tool | Description |
|---|---|
| `daily` | Usage grouped by date |
| `monthly` | Usage grouped by month |
| `session` | Usage grouped by conversation session |
| `blocks` | Usage grouped by 5-hour session billing blocks |
Each tool accepts `since`, `until`, `mode`, `timezone`, `locale` and
returns JSON in an MCP text content block.
Invoke as an MCP client from the daemon (`mcp` Python SDK) or as
raw JSON-RPC to `POST /`.
### Recommendation
**B1**. The CLI path is simpler, has fewer moving parts, and the
performance hit of a subprocess call every 10s is negligible.
Switch to B2 only if you also want the MCP surface exposed to other
local agents (Claude Code can already consume ccusage's MCP).
## Short-window tach via watchdog
ccusage aggregates are too coarse for the tach. The daemon keeps its
own 60-second ring buffer by tailing JSONL directly.
```python
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from collections import deque
from pathlib import Path
import json, time
class JsonlTail(FileSystemEventHandler):
def __init__(self, bus):
self.bus = bus
self.offsets: dict[Path, int] = {}
def on_modified(self, event):
p = Path(event.src_path)
if p.suffix != ".jsonl":
return
off = self.offsets.get(p, 0)
with p.open() as f:
f.seek(off)
for line in f:
try:
d = json.loads(line)
except json.JSONDecodeError:
continue
if d.get("type") == "assistant":
u = d.get("message", {}).get("usage", {})
tokens = sum(u.get(k, 0) for k in (
"input_tokens", "output_tokens",
"cache_read_input_tokens",
"cache_creation_input_tokens",
))
model = d.get("message", {}).get("model", "")
self.bus.push(time.time(), tokens, model)
self.offsets[p] = f.tell()
class RateBus:
def __init__(self, window_s=60):
self.window_s = window_s
self.buf: deque[tuple[float, int, str]] = deque()
def push(self, ts, tokens, model):
self.buf.append((ts, tokens, model))
self._evict()
def _evict(self):
cutoff = time.time() - self.window_s
while self.buf and self.buf[0][0] < cutoff:
self.buf.popleft()
def rate_per_min(self):
self._evict()
return sum(t for _, t, _ in self.buf)
def last_model(self):
return self.buf[-1][2] if self.buf else None
```
## Daemon (B)
```
src/claude_gauge/
__init__.py
daemon_ccusage.py FastAPI app, ccusage subprocess calls, /usage
tail.py watchdog + RateBus for tach
config.py
calibration.py
```
```python
# daemon_ccusage.py sketch
import asyncio, json, os, subprocess
from fastapi import FastAPI
from .tail import RateBus, start_watcher
CEIL_5H = int(os.environ.get("CLAUDE_GAUGE_5H_CEILING", 500_000))
CEIL_7D = int(os.environ.get("CLAUDE_GAUGE_7D_CEILING", 3_000_000))
RED = int(os.environ.get("CLAUDE_GAUGE_TACH_REDLINE", 8000))
bus = RateBus(window_s=60)
start_watcher(bus) # background thread
app = FastAPI()
async def ccusage(cmd: str) -> dict:
proc = await asyncio.create_subprocess_exec(
"npx", "ccusage@latest", cmd, "--json",
stdout=asyncio.subprocess.PIPE,
)
out, _ = await proc.communicate()
return json.loads(out)
async def current_5h_tokens() -> int:
blocks = await ccusage("blocks")
cur = next((b for b in blocks.get("blocks", []) if b.get("isActive")), None)
return cur["totalTokens"] if cur else 0
async def trailing_7d_tokens() -> int:
daily = await ccusage("daily")
# sum last 7 daily buckets
rows = daily.get("daily", [])[-7:]
return sum(r["totalTokens"] for r in rows)
@app.get("/usage")
async def usage():
rate = bus.rate_per_min()
w5, w7 = await asyncio.gather(current_5h_tokens(), trailing_7d_tokens())
return {
"rate_1m": rate,
"window_5h_tokens": w5,
"window_5h_pct": min(1.0, w5 / CEIL_5H),
"window_7d_tokens": w7,
"window_7d_pct": min(1.0, w7 / CEIL_7D),
"hot": rate > RED,
"warn": (w5 / CEIL_5H) > 0.8 or (w7 / CEIL_7D) > 0.8,
"stall": rate == 0 and not bus.buf,
"idle": True,
"last_model": bus.last_model(),
}
```
Cache `ccusage blocks/daily` output with a 10s TTL so the `/usage`
endpoint stays cheap when the firmware polls at 1 Hz.
## Dependencies (B)
```toml
dependencies = [
"fastapi>=0.136.0",
"uvicorn[standard]>=0.44.0",
"watchdog>=5.0.0",
]
```
Node needs to be on the PATH for `npx ccusage@latest`. Pin a version
in config rather than using `@latest` once the daemon is past Phase A.
## Pros and cons of B
Pros:
* Single process, one dependency tree.
* Sub-second tach works out of the box via the watchdog tail.
* No service stack, no Docker, no collector.
* ccusage is actively maintained and has already solved the edge
cases in JSONL parsing (missing fields, renamed formats, cache
token math, cost per model).
Cons:
* No free Grafana dashboard. If you want deep stats, either run
`ccusage` interactively or build something.
* Node on the runtime path.
* JSONL format is an implementation detail; upstream changes could
break parsing. ccusage tracks these but there's a lag window.
* Does not generalise if other machines also run Claude Code; each
one needs its own daemon.
---
# Hardware (shared by A and B)
## Movement
**Switec X27.168** automotive stepper motor. 315-degree sweep, 600
steps, roughly 2 degrees / step. ~$8 each. Used in car dashboards,
so enclosures and bezels exist off the shelf.
Related cousins: X25, VID28, VID29, BKA30D-R5. The library supports
all of them, but X27.168 has the longest sweep and the most
available tutorials.
## Driver
`SwitecX25` Arduino library (`clearwater/SwitecX25` on GitHub). Works
for X27.168 despite the name. Drives 4 GPIO pins per motor. No
external driver IC required for short wiring runs; use small
transistor arrays (ULN2003A) if you want cleaner current handling.
No maintained MicroPython port exists. **Firmware is Arduino C++**
rather than MicroPython. Not the original plan, but the right trade.
## Board
**ESP32 DevKit** (generic). WiFi, enough GPIO for 3 steppers (12
pins) plus 8 annunciator LEDs and a reset button. ~$8.
Alternative: Raspberry Pi Pico W. Less toolchain overhead if you
prefer CircuitPython, but you'd still be hand-rolling the stepper
driver.
## Wiring sketch
```
ESP32 DevKit
GPIO 13,14,27,26 --> X27.168 #1 (left fuel)
GPIO 25,33,32,35 --> X27.168 #2 (tach)
GPIO 34,39,36,22 --> X27.168 #3 (right fuel)
GPIO 21 --> OPUS LED (red)
GPIO 19 --> SONNET LED (amber)
GPIO 18 --> HAIKU LED (green)
GPIO 5 --> HOT LED (red, PWM for flashing)
GPIO 17 --> WARN LED (amber)
GPIO 16 --> STALL LED (blue)
GPIO 4 --> IDLE LED (green, pulses while daemon reachable)
GPIO 15 --> tactile reset button (pull-up)
```
220R resistors per LED. Use a separate 5V rail for the steppers if
you see brownouts when all three move at once; ESP32's 3V3 rail is
fine for signals but the motors pull more than the onboard regulator
likes.
## Firmware structure
```
firmware/
platformio.ini
src/
main.cpp setup() + loop()
wifi.cpp connect + reconnect
gauge.cpp wraps SwitecX25; map pct 0..1 to 0..steps
annunciator.cpp LED state machine
poll.cpp HTTP GET /usage every 1s
config.h daemon URL, redline, thresholds
```
Poll loop:
1. Every 1000ms, GET `http://<daemon>:8080/usage`.
2. Parse JSON (ArduinoJson).
3. Set gauge targets: `tach.setTargetStep(map(rate_1m, 0, redline, 0, 600))`, likewise for fuels.
4. Update LED states from `hot/warn/stall/idle/last_model`.
5. `gauge.update()` runs the stepper every loop tick until it hits target.
## Enclosure
* Cream faces, hairline burgundy redline zone (matches quartermaster
palette if you want the house look).
* Brushed aluminium bezel; 3D-print + spray-paint is fine for V1.
* Annunciator row behind smoked acrylic so the LEDs only show when
lit.
* Desk-size footprint: roughly 180mm wide x 90mm tall for the cluster.
---
# Phasing
One phase per issue. No scope bleed between phases.
| Phase | Deliverable | Architecture-agnostic? |
|---|---|---|
| A | Daemon prints five window values to stdout | No (A or B chosen before start) |
| B | `/usage` HTTP endpoint; curl from browser or another box | No |
| C | ESP32 firmware driving ONE needle (tach) from the daemon | Yes |
| D | Three needles plus annunciator row | Yes |
| E | Calibration period: tune ceilings and redline against real use | Yes |
| F | Enclosure V1 (printed), cabling, permanent install | Yes |
| G | (If A) Grafana dashboard wired in; (if B) pick a deep-stats path or decline | Diverges |
| H | Character metrics and cross-system correlations (em-dash counter, git correlation, quartermaster correlation) | Yes |
Do not attempt Phase D before Phase C. Hardware integration is
where surprises land; start with one axis.
---
# Recommendation (for Jeff's homelab)
Architecture A.
The homelab-as-enterprise framing is the deciding factor. OTEL is
the platform feature, Prometheus is already the right tool, Grafana
dashboard 25052 is a free deep-stats surface, and the architecture
generalises if other machines start running Claude Code. The 15s
scrape interval is the only real concession; if the tach feels
sluggish after Phase E, bolt the JSONL tail from B on top for the
tach path only. Hybrid.
If you don't already run Prometheus in the homelab, B gets you to
a working needle sooner (Phase A ships same day). Migrate to A
later if OTEL becomes useful for other things.
Either way, the firmware and cluster are identical. The architecture
choice is only about what the daemon reads.
# Metrics brainstorm (for later phases)
All derivable from OTEL (A) or from the JSONL directly (B). Not
wired into the primary cluster; land in Phase G or a future Grafana
panel.
### Cost and tokens
* Cache hit rate and cache-savings dollar value.
* Cost per session at published pricing.
* Projected monthly spend.
* Opus / Sonnet / Haiku token split.
* Server tool use (web search / web fetch) counts.
### Time and rhythm
* Session count, duration distribution, time-of-day heatmap.
* Think-time (user idle) vs work-time (assistant active).
* Streak tracking; all-nighter detector.
### Work shape
* Thinking-to-output ratio as a "cogitation index" gauge.
* Stop-reason distribution (watch rising `max_tokens`).
* Tool calls per assistant response (parallelism indicator).
### Tool usage
* Top tools by count. Bash-command root-executable distribution.
* File reads vs edits vs writes per session.
* Hottest files across all sessions.
* Agent / subagent counts (`isSidechain=true`).
### Project and context
* Tokens per project, last-active timestamp, dormant-project detector.
### Friction and quality
* Permission denial frequency.
* File-history-snapshot count per session.
### Character
* **Em-dash violation counter** against the CLAUDE.md rule.
* Most-used phrase by Claude vs by the user.
* Thank-you rate, "Dude, chill" detector.
### Cross-system
* Git correlation: commits produced, lines changed per token.
* Quartermaster correlation: budget-editing days vs Claude load.
---
# Next steps
1. Decide A or B. Default: A.
2. File Phase A as the first issue on `archeious/claude-gauge`.
3. If A: stand up the Compose stack, point Claude Code at it,
verify metrics reach Prometheus via the `/api/v1/query` browser
interface.
4. If B: install `ccusage`, run `blocks --json` and `daily --json`
by hand, paste the outputs somewhere durable for reference.
5. Ship Phase A. See the numbers tick in a terminal.