From 88e68ea2f99e0752163bf45fc1a7f2efe399a7ad Mon Sep 17 00:00:00 2001 From: Jeff Smith Date: Sun, 19 Apr 2026 11:15:40 -0600 Subject: [PATCH] docs: design spec for /healthz and structured JSON logs (#26, #27) Part of the platform-contract intake (#25). Covers both pieces of work that must land before first deploy to home-ctr-onyx. Co-Authored-By: Claude Opus 4.7 (1M context) --- ...2026-04-19-healthz-and-json-logs-design.md | 174 ++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 docs/superpowers/specs/2026-04-19-healthz-and-json-logs-design.md diff --git a/docs/superpowers/specs/2026-04-19-healthz-and-json-logs-design.md b/docs/superpowers/specs/2026-04-19-healthz-and-json-logs-design.md new file mode 100644 index 0000000..b00fbe2 --- /dev/null +++ b/docs/superpowers/specs/2026-04-19-healthz-and-json-logs-design.md @@ -0,0 +1,174 @@ +# Healthcheck endpoint and structured JSON logs + +**Date:** 2026-04-19 +**Issues:** #26 (healthz), #27 (JSON logs), part of #25 (platform contract intake) + +## Background + +The homelab platform contract for Quartermaster (#25) requires two +things the codebase does not have today: + +1. A Docker `HEALTHCHECK` so `container_health_status` is visible to + cAdvisor/Prometheus, which in turn drives the container-down alert + planned at launch. That requires an in-app endpoint to target. +2. Structured JSON logs on stdout with `level` and `event` fields so + Promtail indexes them as Loki labels. + +Both block the first deploy to `home-ctr-onyx`. This spec covers both +so the work can land as one coherent change. + +## /healthz + +### Endpoint + +`GET /healthz`, unauthenticated. + +- Success: `200 {"status": "ok"}`. +- Failure: `503 {"status": "error", "detail": ""}`. + The class name goes in so operators can tell from the response body + what tripped the check; no traceback or message is leaked. + +The check opens a session via the standard `SessionLocal` factory, +runs `SELECT 1`, and closes. Any exception surfaces as a 503. + +### Placement + +New module `src/quartermaster/routes_health.py` with its own +`APIRouter`, included from `main.create_app()` alongside the existing +routers. Keeping it on a dedicated router means any future middleware +(basic-auth, rate-limit bypass) applied to the main routers can leave +`/healthz` alone — the Docker healthcheck runs inside the container +and must not need credentials. + +### Tests + +`tests/test_health.py`: + +- Success: FastAPI `TestClient` hits `/healthz`, asserts 200 and + `{"status": "ok"}`. +- Failure: monkey-patch the session factory to raise on `.execute()`, + assert 503 and `{"status": "error", "detail": ""}`. + +## Structured JSON logs + +### Dependency + +Add `python-json-logger` to `[project].dependencies` in +`pyproject.toml`. One small, single-purpose dep; no transitive +surprises. `structlog` is explicitly out of scope (#27). + +### Config module + +New `src/quartermaster/logging_config.py` exposing `LOG_CONFIG`, a +`logging.config.dictConfig`-compatible dict: + +- One formatter using `pythonjsonlogger.jsonlogger.JsonFormatter` + emitting `timestamp` (ISO-8601 UTC), `level`, `event`, `logger`, + `message`. `extra={...}` kwargs passed to logger calls flatten into + the JSON body. +- One handler writing to `sys.stdout`. +- Loggers: the root app logger and `uvicorn.access` both route + through the JSON handler. `uvicorn.error` also gets the handler so + startup / shutdown lines are captured in the same format. + +A Python dict (rather than YAML) is the source of truth because +tests can import it and apply `dictConfig` in-process. The uvicorn +CLI consumes it via a small `logconfig.yaml` shim at repo root that +references the dict module. + +### Access log filter + +Uvicorn's access logger emits a record whose message is the raw +access line; the fields we care about live on the record's positional +args. A small `logging.Filter` subclass in `logging_config.py` unpacks +those args and sets: + +- `event = "http_request"` +- `method`, `path`, `status`, `client_ip` +- `duration_ms` (uvicorn doesn't expose this natively; computed via + the `extra` injected by a small middleware if straightforward, + otherwise deferred — the filter already gives Loki status + path, + which is the main thing) + +If the duration cannot be obtained cheaply from uvicorn's access +record, landing the rest is still a win; the `duration_ms` field can +come in a follow-up without changing the log schema (it's an extra +field, not a label). + +### Seed application events + +Five events added as single-line `logger.info(..., extra={"event": +"..."})` calls at the matching code paths (names aligned with the +existing function names): + +| Event | Site | +|---|---| +| `month_created` | `month_service.create_month` | +| `month_closed` | `month_service.close_month` | +| `template_entry_updated` | `service.update_entry` | +| `posting_added` | `month_service.add_posting` | +| `posting_deleted` | `month_service.delete_posting` | + +One module-scoped logger at the top of each file that touches these +paths. No broader instrumentation in this change. + +### Tests + +`tests/test_logging.py`: + +- Apply `LOG_CONFIG` via `logging.config.dictConfig`, emit a record + with `extra={"event": "smoke"}`, capture stdout via `capsys`, + `json.loads` the captured line, assert `level` / `event` / + `logger` / `message` / `timestamp` all present and correct. +- Feed a synthetic uvicorn access record through the filter, assert + resulting fields include `event="http_request"`, `method`, `path`, + `status`. + +No end-to-end uvicorn-subprocess test. Formatter and filter +correctness at the handler level is enough for the launch contract. + +### Dev flow + +`uv run uvicorn quartermaster.main:app --log-config logconfig.yaml +--reload` — `--reload` keeps working. README gets a short "Logs" +section with two LogQL examples mirroring the Archon contract style. + +## File additions / changes + +New: +- `src/quartermaster/routes_health.py` +- `src/quartermaster/logging_config.py` +- `logconfig.yaml` (YAML shim for uvicorn CLI) +- `tests/test_health.py` +- `tests/test_logging.py` + +Changed: +- `pyproject.toml` — add `python-json-logger` +- `src/quartermaster/main.py` — include the health router +- `src/quartermaster/service.py` — add one `logger.info` seed call + in `update_entry` +- `src/quartermaster/month_service.py` — add four `logger.info` seed + calls in `create_month`, `close_month`, `add_posting`, + `delete_posting` +- `README.md` — add the "Logs" section and mention `--log-config` in + the Run block + +Not touched: +- Dockerfile / Compose: owned by later issues under #25. +- Alembic / DB layer: the healthcheck uses the existing session + factory; no migration. + +## Order of work + +Logging before healthz. Once `LOG_CONFIG` exists the healthz handler +can emit `event="healthz_check"` for free; the reverse order doesn't +give logging anything useful. Not load-bearing. + +## Out of scope + +- `/readyz` vs. `/livez` split — one endpoint covers this single- + container app. +- `/metrics` or any Prometheus exposition (5.2 in #25 is "not needed"). +- Adding `structlog` (#27 explicitly excludes). +- Log-shipping configuration — Promtail on the host handles it. +- Broad app instrumentation beyond the five seed events.