archeious/claude-gauge

Fork 0

Table of Contents

Data Sources

A. OTEL-native

Docker Compose
OTEL collector config
Prometheus scrape config
Claude Code environment
Claude Code metrics (surfaced in Prometheus)
PromQL for each gauge
Daemon (A)
Deep-stats dashboard
Pros and cons

B. ccusage-sourced

ccusage integration
Short-window tach via watchdog
Daemon (B)
Pros and cons

Recommendation

Data Sources

Two architectures, implemented side by side. Pick one; both produce the same /usage payload for the firmware.

A. OTEL-native

Uses Claude Code's built-in OpenTelemetry support. Mirrors the reference stack published in anthropics/claude-code-monitoring-guide.

Docker Compose

Three containers: OTEL collector, Prometheus, Grafana.

services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "8889:8889"   # Prometheus scrape

  prometheus:
    image: prom/prometheus:latest
    ports: ["9090:9090"]
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=8d'
      - '--web.enable-lifecycle'

  grafana:
    image: grafana/grafana:latest
    ports: ["3000:3000"]
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

volumes:
  prometheus_data:
  grafana_data:

OTEL collector config

receivers:
  otlp:
    protocols:
      grpc: { endpoint: 0.0.0.0:4317 }
      http: { endpoint: 0.0.0.0:4318 }

processors:
  batch: { timeout: 1s, send_batch_size: 1024 }
  memory_limiter: { check_interval: 1s, limit_mib: 512 }

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    send_timestamps: true
    metric_expiration: 192h   # 8 days; needed for 7d queries
    enable_open_metrics: true

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]

Prometheus scrape config

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'otel-collector'
    static_configs:
      - targets: ['otel-collector:8889']

Claude Code environment

Set in the shell Claude Code launches in. User profile or ~/.claude/settings.json managed settings both work.

export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_METRIC_EXPORT_INTERVAL=10000   # 10s for responsiveness
export OTEL_METRICS_INCLUDE_SESSION_ID=false

Claude Code metrics (surfaced in Prometheus)

After OTEL-to-Prom conversion:

Prometheus metric	Labels
`claude_code_token_usage_tokens_total`	`type` (input, output, cacheRead, cacheCreation), `model`
`claude_code_cost_usage_USD_total`	`model`
`claude_code_session_count_total`
`claude_code_active_time_total_seconds_total`	`type` (user, cli)
`claude_code_lines_of_code_count_total`	`type` (added, removed)
`claude_code_commit_count_total`
`claude_code_pull_request_count_total`
`claude_code_code_edit_tool_decision_count_total`	`tool_name`, `decision`, `language`

Full event schema is in the Claude Code monitoring docs.

PromQL for each gauge

# Tokens/min (tach)
sum(rate(claude_code_token_usage_tokens_total[1m])) * 60

# 5h fuel
sum(increase(claude_code_token_usage_tokens_total[5h]))

# Thinking/output ratio (temp gauge)
# Note: OTEL does not emit thinking tokens as a dedicated metric;
# derive via event stream (claude_code.api_request + prompt.id
# correlation) or fall back to a constant 0 until instrumented.

# Cache hit rate (boost gauge)
  sum(rate(claude_code_token_usage_tokens_total{type="cacheRead"}[5m]))
/ sum(rate(claude_code_token_usage_tokens_total{type=~"input|cacheRead|cacheCreation"}[5m]))

# Last model (approximation)
topk(1, claude_code_token_usage_tokens_total{type="output"})

Daemon (A)

src/claude_gauge/
  daemon_prom.py     FastAPI, PromQL queries, /usage
  config.py          ceilings, URLs, thresholds
  windows.py         query builders and result parsing

async def prom(q: str) -> float:
    r = await client.get(f"{PROM}/api/v1/query", params={"query": q})
    data = r.json()["data"]["result"]
    return float(data[0]["value"][1]) if data else 0.0

@app.get("/usage")
async def usage():
    rate_1m = await prom(Q_TACH)
    win_5h  = await prom(Q_5H)
    cache   = await prom(Q_CACHE)
    ...

Deep-stats dashboard

Import Grafana Labs dashboard 25052 ("Claude Code") against the Prometheus data source. That is the deep-stats surface. No custom web UI needed.

Pros and cons

Pros:

Uses the platform feature Anthropic ships
Grafana dashboard is free
Metric schema is documented and stable
Generalises cleanly if other hosts also run Claude Code

Cons:

Scrape interval caps tach responsiveness at ~15s
Three containers to run
Requires env-var changes on every Claude Code launch surface

If the 15s cap is annoying, bolt a JSONL tail from B on top for the tach path only. Hybrid. See Hardware for the firmware contract that stays identical either way.

B. ccusage-sourced

One process. No collector, no Prometheus, no Grafana. The daemon subprocess-calls ccusage for the fuel gauges and tails JSONL directly for the tach.

ccusage integration

Two shapes, pick one.

B1 (simplest): periodic CLI subprocess.

npx ccusage@latest blocks --json    # current 5h block
npx ccusage@latest daily  --json    # per-day aggregates

B2 (persistent): ccusage MCP HTTP server. Exposes four tools (daily, monthly, session, blocks) at POST / over StreamableHTTP MCP transport.

bunx @ccusage/mcp@latest --type http --port 8080

B1 is the default. Switch to B2 only if you also want the MCP surface available to other local agents.

Short-window tach via watchdog

ccusage aggregates are too coarse for a responsive tach. The daemon keeps a 60-second ring buffer by tailing ~/.claude/projects/**/*.jsonl directly.

from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from collections import deque
import json, time

class JsonlTail(FileSystemEventHandler):
    def __init__(self, bus):
        self.bus = bus
        self.offsets = {}

    def on_modified(self, event):
        p = Path(event.src_path)
        if p.suffix != ".jsonl":
            return
        off = self.offsets.get(p, 0)
        with p.open() as f:
            f.seek(off)
            for line in f:
                try:
                    d = json.loads(line)
                except json.JSONDecodeError:
                    continue
                if d.get("type") == "assistant":
                    u = d["message"].get("usage", {})
                    tokens = sum(u.get(k, 0) for k in (
                        "input_tokens", "output_tokens",
                        "cache_read_input_tokens",
                        "cache_creation_input_tokens",
                    ))
                    thinking = extract_thinking_tokens(d)
                    model = d["message"].get("model", "")
                    self.bus.push(time.time(), tokens, thinking, model)
            self.offsets[p] = f.tell()

Daemon (B)

src/claude_gauge/
  daemon_ccusage.py   FastAPI, ccusage subprocess, /usage
  tail.py             watchdog + RateBus
  config.py

async def ccusage(cmd):
    proc = await asyncio.create_subprocess_exec(
        "npx", "ccusage@latest", cmd, "--json",
        stdout=asyncio.subprocess.PIPE,
    )
    out, _ = await proc.communicate()
    return json.loads(out)

@app.get("/usage")
async def usage():
    rate = bus.rate_per_min()
    blocks = await ccusage("blocks")
    cur = next((b for b in blocks["blocks"] if b.get("isActive")), None)
    w5 = cur["totalTokens"] if cur else 0
    return {
        "rate_1m": rate,
        "window_5h_tokens": w5,
        "window_5h_pct": min(1.0, w5 / CEIL_5H),
        "thinking_ratio": bus.thinking_ratio(),
        "cache_hit_rate": bus.cache_hit_rate(),
        "last_model": bus.last_model(),
        "hot": rate > RED,
        "warn": (w5 / CEIL_5H) > 0.8,
        "stall": bus.silent_for_minutes() > STALL_MIN,
        "idle": True,
    }

Cache ccusage output with a 10s TTL so 1 Hz firmware polling does not spawn a subprocess every request.

Pros and cons

Pros:

Single process, one dependency tree
Sub-second tach out of the box
No Docker, no collector, no env vars on the Claude Code side
ccusage has solved the JSONL edge cases already

Cons:

No free Grafana dashboard; deep stats require the ccusage TUI or a custom surface
Node required on the runtime path
JSONL format is an implementation detail; upstream changes can break parsing

Recommendation

Architecture A for homelab-as-enterprise framing. OTEL is the platform feature, Prometheus integrates with the rest of the homelab stack, and Grafana dashboard 25052 is free deep-stats. Scrape interval cap is the only real concession; hybrid with the JSONL tail from B if the tach feels sluggish after calibration.

Architecture B if Prometheus is not already running and you want a working needle sooner. Ship it, migrate to A later if OTEL becomes useful for other things.

Firmware and cluster are identical either way.