Data Sources
Two architectures, implemented side by side. Pick one; both produce
the same /usage payload for the firmware.
A. OTEL-native
Uses Claude Code's built-in OpenTelemetry support. Mirrors the
reference stack published in anthropics/claude-code-monitoring-guide.
Docker Compose
Three containers: OTEL collector, Prometheus, Grafana.
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "8889:8889" # Prometheus scrape
prometheus:
image: prom/prometheus:latest
ports: ["9090:9090"]
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=8d'
- '--web.enable-lifecycle'
grafana:
image: grafana/grafana:latest
ports: ["3000:3000"]
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
prometheus_data:
grafana_data:
OTEL collector config
receivers:
otlp:
protocols:
grpc: { endpoint: 0.0.0.0:4317 }
http: { endpoint: 0.0.0.0:4318 }
processors:
batch: { timeout: 1s, send_batch_size: 1024 }
memory_limiter: { check_interval: 1s, limit_mib: 512 }
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
send_timestamps: true
metric_expiration: 192h # 8 days; needed for 7d queries
enable_open_metrics: true
service:
pipelines:
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
Prometheus scrape config
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'otel-collector'
static_configs:
- targets: ['otel-collector:8889']
Claude Code environment
Set in the shell Claude Code launches in. User profile or
~/.claude/settings.json managed settings both work.
export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_METRIC_EXPORT_INTERVAL=10000 # 10s for responsiveness
export OTEL_METRICS_INCLUDE_SESSION_ID=false
Claude Code metrics (surfaced in Prometheus)
After OTEL-to-Prom conversion:
| Prometheus metric | Labels |
|---|---|
claude_code_token_usage_tokens_total |
type (input, output, cacheRead, cacheCreation), model |
claude_code_cost_usage_USD_total |
model |
claude_code_session_count_total |
|
claude_code_active_time_total_seconds_total |
type (user, cli) |
claude_code_lines_of_code_count_total |
type (added, removed) |
claude_code_commit_count_total |
|
claude_code_pull_request_count_total |
|
claude_code_code_edit_tool_decision_count_total |
tool_name, decision, language |
Full event schema is in the Claude Code monitoring docs.
PromQL for each gauge
# Tokens/min (tach)
sum(rate(claude_code_token_usage_tokens_total[1m])) * 60
# 5h fuel
sum(increase(claude_code_token_usage_tokens_total[5h]))
# Thinking/output ratio (temp gauge)
# Note: OTEL does not emit thinking tokens as a dedicated metric;
# derive via event stream (claude_code.api_request + prompt.id
# correlation) or fall back to a constant 0 until instrumented.
# Cache hit rate (boost gauge)
sum(rate(claude_code_token_usage_tokens_total{type="cacheRead"}[5m]))
/ sum(rate(claude_code_token_usage_tokens_total{type=~"input|cacheRead|cacheCreation"}[5m]))
# Last model (approximation)
topk(1, claude_code_token_usage_tokens_total{type="output"})
Daemon (A)
src/claude_gauge/
daemon_prom.py FastAPI, PromQL queries, /usage
config.py ceilings, URLs, thresholds
windows.py query builders and result parsing
async def prom(q: str) -> float:
r = await client.get(f"{PROM}/api/v1/query", params={"query": q})
data = r.json()["data"]["result"]
return float(data[0]["value"][1]) if data else 0.0
@app.get("/usage")
async def usage():
rate_1m = await prom(Q_TACH)
win_5h = await prom(Q_5H)
cache = await prom(Q_CACHE)
...
Deep-stats dashboard
Import Grafana Labs dashboard 25052 ("Claude Code") against the Prometheus data source. That is the deep-stats surface. No custom web UI needed.
Pros and cons
Pros:
- Uses the platform feature Anthropic ships
- Grafana dashboard is free
- Metric schema is documented and stable
- Generalises cleanly if other hosts also run Claude Code
Cons:
- Scrape interval caps tach responsiveness at ~15s
- Three containers to run
- Requires env-var changes on every Claude Code launch surface
If the 15s cap is annoying, bolt a JSONL tail from B on top for the tach path only. Hybrid. See Hardware for the firmware contract that stays identical either way.
B. ccusage-sourced
One process. No collector, no Prometheus, no Grafana. The daemon
subprocess-calls ccusage for the fuel gauges and tails JSONL
directly for the tach.
ccusage integration
Two shapes, pick one.
B1 (simplest): periodic CLI subprocess.
npx ccusage@latest blocks --json # current 5h block
npx ccusage@latest daily --json # per-day aggregates
B2 (persistent): ccusage MCP HTTP server. Exposes four tools
(daily, monthly, session, blocks) at POST / over
StreamableHTTP MCP transport.
bunx @ccusage/mcp@latest --type http --port 8080
B1 is the default. Switch to B2 only if you also want the MCP surface available to other local agents.
Short-window tach via watchdog
ccusage aggregates are too coarse for a responsive tach. The
daemon keeps a 60-second ring buffer by tailing
~/.claude/projects/**/*.jsonl directly.
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from collections import deque
import json, time
class JsonlTail(FileSystemEventHandler):
def __init__(self, bus):
self.bus = bus
self.offsets = {}
def on_modified(self, event):
p = Path(event.src_path)
if p.suffix != ".jsonl":
return
off = self.offsets.get(p, 0)
with p.open() as f:
f.seek(off)
for line in f:
try:
d = json.loads(line)
except json.JSONDecodeError:
continue
if d.get("type") == "assistant":
u = d["message"].get("usage", {})
tokens = sum(u.get(k, 0) for k in (
"input_tokens", "output_tokens",
"cache_read_input_tokens",
"cache_creation_input_tokens",
))
thinking = extract_thinking_tokens(d)
model = d["message"].get("model", "")
self.bus.push(time.time(), tokens, thinking, model)
self.offsets[p] = f.tell()
Daemon (B)
src/claude_gauge/
daemon_ccusage.py FastAPI, ccusage subprocess, /usage
tail.py watchdog + RateBus
config.py
async def ccusage(cmd):
proc = await asyncio.create_subprocess_exec(
"npx", "ccusage@latest", cmd, "--json",
stdout=asyncio.subprocess.PIPE,
)
out, _ = await proc.communicate()
return json.loads(out)
@app.get("/usage")
async def usage():
rate = bus.rate_per_min()
blocks = await ccusage("blocks")
cur = next((b for b in blocks["blocks"] if b.get("isActive")), None)
w5 = cur["totalTokens"] if cur else 0
return {
"rate_1m": rate,
"window_5h_tokens": w5,
"window_5h_pct": min(1.0, w5 / CEIL_5H),
"thinking_ratio": bus.thinking_ratio(),
"cache_hit_rate": bus.cache_hit_rate(),
"last_model": bus.last_model(),
"hot": rate > RED,
"warn": (w5 / CEIL_5H) > 0.8,
"stall": bus.silent_for_minutes() > STALL_MIN,
"idle": True,
}
Cache ccusage output with a 10s TTL so 1 Hz firmware polling does
not spawn a subprocess every request.
Pros and cons
Pros:
- Single process, one dependency tree
- Sub-second tach out of the box
- No Docker, no collector, no env vars on the Claude Code side
ccusagehas solved the JSONL edge cases already
Cons:
- No free Grafana dashboard; deep stats require the
ccusageTUI or a custom surface - Node required on the runtime path
- JSONL format is an implementation detail; upstream changes can break parsing
Recommendation
Architecture A for homelab-as-enterprise framing. OTEL is the platform feature, Prometheus integrates with the rest of the homelab stack, and Grafana dashboard 25052 is free deep-stats. Scrape interval cap is the only real concession; hybrid with the JSONL tail from B if the tach feels sluggish after calibration.
Architecture B if Prometheus is not already running and you want a working needle sooner. Ship it, migrate to A later if OTEL becomes useful for other things.
Firmware and cluster are identical either way.