Part of the platform-contract intake (#25). Covers both pieces of work that must land before first deploy to home-ctr-onyx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.2 KiB
Healthcheck endpoint and structured JSON logs
Date: 2026-04-19 Issues: #26 (healthz), #27 (JSON logs), part of #25 (platform contract intake)
Background
The homelab platform contract for Quartermaster (#25) requires two things the codebase does not have today:
- A Docker
HEALTHCHECKsocontainer_health_statusis visible to cAdvisor/Prometheus, which in turn drives the container-down alert planned at launch. That requires an in-app endpoint to target. - Structured JSON logs on stdout with
levelandeventfields so Promtail indexes them as Loki labels.
Both block the first deploy to home-ctr-onyx. This spec covers both
so the work can land as one coherent change.
/healthz
Endpoint
GET /healthz, unauthenticated.
- Success:
200 {"status": "ok"}. - Failure:
503 {"status": "error", "detail": "<exception class name>"}. The class name goes in so operators can tell from the response body what tripped the check; no traceback or message is leaked.
The check opens a session via the standard SessionLocal factory,
runs SELECT 1, and closes. Any exception surfaces as a 503.
Placement
New module src/quartermaster/routes_health.py with its own
APIRouter, included from main.create_app() alongside the existing
routers. Keeping it on a dedicated router means any future middleware
(basic-auth, rate-limit bypass) applied to the main routers can leave
/healthz alone — the Docker healthcheck runs inside the container
and must not need credentials.
Tests
tests/test_health.py:
- Success: FastAPI
TestClienthits/healthz, asserts 200 and{"status": "ok"}. - Failure: monkey-patch the session factory to raise on
.execute(), assert 503 and{"status": "error", "detail": "<class-name>"}.
Structured JSON logs
Dependency
Add python-json-logger to [project].dependencies in
pyproject.toml. One small, single-purpose dep; no transitive
surprises. structlog is explicitly out of scope (#27).
Config module
New src/quartermaster/logging_config.py exposing LOG_CONFIG, a
logging.config.dictConfig-compatible dict:
- One formatter using
pythonjsonlogger.jsonlogger.JsonFormatteremittingtimestamp(ISO-8601 UTC),level,event,logger,message.extra={...}kwargs passed to logger calls flatten into the JSON body. - One handler writing to
sys.stdout. - Loggers: the root app logger and
uvicorn.accessboth route through the JSON handler.uvicorn.erroralso gets the handler so startup / shutdown lines are captured in the same format.
A Python dict (rather than YAML) is the source of truth because
tests can import it and apply dictConfig in-process. The uvicorn
CLI consumes it via a small logconfig.yaml shim at repo root that
references the dict module.
Access log filter
Uvicorn's access logger emits a record whose message is the raw
access line; the fields we care about live on the record's positional
args. A small logging.Filter subclass in logging_config.py unpacks
those args and sets:
event = "http_request"method,path,status,client_ipduration_ms(uvicorn doesn't expose this natively; computed via theextrainjected by a small middleware if straightforward, otherwise deferred — the filter already gives Loki status + path, which is the main thing)
If the duration cannot be obtained cheaply from uvicorn's access
record, landing the rest is still a win; the duration_ms field can
come in a follow-up without changing the log schema (it's an extra
field, not a label).
Seed application events
Five events added as single-line logger.info(..., extra={"event": "..."}) calls at the matching code paths (names aligned with the
existing function names):
| Event | Site |
|---|---|
month_created |
month_service.create_month |
month_closed |
month_service.close_month |
template_entry_updated |
service.update_entry |
posting_added |
month_service.add_posting |
posting_deleted |
month_service.delete_posting |
One module-scoped logger at the top of each file that touches these paths. No broader instrumentation in this change.
Tests
tests/test_logging.py:
- Apply
LOG_CONFIGvialogging.config.dictConfig, emit a record withextra={"event": "smoke"}, capture stdout viacapsys,json.loadsthe captured line, assertlevel/event/logger/message/timestampall present and correct. - Feed a synthetic uvicorn access record through the filter, assert
resulting fields include
event="http_request",method,path,status.
No end-to-end uvicorn-subprocess test. Formatter and filter correctness at the handler level is enough for the launch contract.
Dev flow
uv run uvicorn quartermaster.main:app --log-config logconfig.yaml --reload — --reload keeps working. README gets a short "Logs"
section with two LogQL examples mirroring the Archon contract style.
File additions / changes
New:
src/quartermaster/routes_health.pysrc/quartermaster/logging_config.pylogconfig.yaml(YAML shim for uvicorn CLI)tests/test_health.pytests/test_logging.py
Changed:
pyproject.toml— addpython-json-loggersrc/quartermaster/main.py— include the health routersrc/quartermaster/service.py— add onelogger.infoseed call inupdate_entrysrc/quartermaster/month_service.py— add fourlogger.infoseed calls increate_month,close_month,add_posting,delete_postingREADME.md— add the "Logs" section and mention--log-configin the Run block
Not touched:
- Dockerfile / Compose: owned by later issues under #25.
- Alembic / DB layer: the healthcheck uses the existing session factory; no migration.
Order of work
Logging before healthz. Once LOG_CONFIG exists the healthz handler
can emit event="healthz_check" for free; the reverse order doesn't
give logging anything useful. Not load-bearing.
Out of scope
/readyzvs./livezsplit — one endpoint covers this single- container app./metricsor any Prometheus exposition (5.2 in #25 is "not needed").- Adding
structlog(#27 explicitly excludes). - Log-shipping configuration — Promtail on the host handles it.
- Broad app instrumentation beyond the five seed events.