Part of the platform-contract intake (#25). Covers both pieces of work that must land before first deploy to home-ctr-onyx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
3b4b34a84c
commit
88e68ea2f9
1 changed files with 174 additions and 0 deletions
|
|
@ -0,0 +1,174 @@
|
||||||
|
# Healthcheck endpoint and structured JSON logs
|
||||||
|
|
||||||
|
**Date:** 2026-04-19
|
||||||
|
**Issues:** #26 (healthz), #27 (JSON logs), part of #25 (platform contract intake)
|
||||||
|
|
||||||
|
## Background
|
||||||
|
|
||||||
|
The homelab platform contract for Quartermaster (#25) requires two
|
||||||
|
things the codebase does not have today:
|
||||||
|
|
||||||
|
1. A Docker `HEALTHCHECK` so `container_health_status` is visible to
|
||||||
|
cAdvisor/Prometheus, which in turn drives the container-down alert
|
||||||
|
planned at launch. That requires an in-app endpoint to target.
|
||||||
|
2. Structured JSON logs on stdout with `level` and `event` fields so
|
||||||
|
Promtail indexes them as Loki labels.
|
||||||
|
|
||||||
|
Both block the first deploy to `home-ctr-onyx`. This spec covers both
|
||||||
|
so the work can land as one coherent change.
|
||||||
|
|
||||||
|
## /healthz
|
||||||
|
|
||||||
|
### Endpoint
|
||||||
|
|
||||||
|
`GET /healthz`, unauthenticated.
|
||||||
|
|
||||||
|
- Success: `200 {"status": "ok"}`.
|
||||||
|
- Failure: `503 {"status": "error", "detail": "<exception class name>"}`.
|
||||||
|
The class name goes in so operators can tell from the response body
|
||||||
|
what tripped the check; no traceback or message is leaked.
|
||||||
|
|
||||||
|
The check opens a session via the standard `SessionLocal` factory,
|
||||||
|
runs `SELECT 1`, and closes. Any exception surfaces as a 503.
|
||||||
|
|
||||||
|
### Placement
|
||||||
|
|
||||||
|
New module `src/quartermaster/routes_health.py` with its own
|
||||||
|
`APIRouter`, included from `main.create_app()` alongside the existing
|
||||||
|
routers. Keeping it on a dedicated router means any future middleware
|
||||||
|
(basic-auth, rate-limit bypass) applied to the main routers can leave
|
||||||
|
`/healthz` alone — the Docker healthcheck runs inside the container
|
||||||
|
and must not need credentials.
|
||||||
|
|
||||||
|
### Tests
|
||||||
|
|
||||||
|
`tests/test_health.py`:
|
||||||
|
|
||||||
|
- Success: FastAPI `TestClient` hits `/healthz`, asserts 200 and
|
||||||
|
`{"status": "ok"}`.
|
||||||
|
- Failure: monkey-patch the session factory to raise on `.execute()`,
|
||||||
|
assert 503 and `{"status": "error", "detail": "<class-name>"}`.
|
||||||
|
|
||||||
|
## Structured JSON logs
|
||||||
|
|
||||||
|
### Dependency
|
||||||
|
|
||||||
|
Add `python-json-logger` to `[project].dependencies` in
|
||||||
|
`pyproject.toml`. One small, single-purpose dep; no transitive
|
||||||
|
surprises. `structlog` is explicitly out of scope (#27).
|
||||||
|
|
||||||
|
### Config module
|
||||||
|
|
||||||
|
New `src/quartermaster/logging_config.py` exposing `LOG_CONFIG`, a
|
||||||
|
`logging.config.dictConfig`-compatible dict:
|
||||||
|
|
||||||
|
- One formatter using `pythonjsonlogger.jsonlogger.JsonFormatter`
|
||||||
|
emitting `timestamp` (ISO-8601 UTC), `level`, `event`, `logger`,
|
||||||
|
`message`. `extra={...}` kwargs passed to logger calls flatten into
|
||||||
|
the JSON body.
|
||||||
|
- One handler writing to `sys.stdout`.
|
||||||
|
- Loggers: the root app logger and `uvicorn.access` both route
|
||||||
|
through the JSON handler. `uvicorn.error` also gets the handler so
|
||||||
|
startup / shutdown lines are captured in the same format.
|
||||||
|
|
||||||
|
A Python dict (rather than YAML) is the source of truth because
|
||||||
|
tests can import it and apply `dictConfig` in-process. The uvicorn
|
||||||
|
CLI consumes it via a small `logconfig.yaml` shim at repo root that
|
||||||
|
references the dict module.
|
||||||
|
|
||||||
|
### Access log filter
|
||||||
|
|
||||||
|
Uvicorn's access logger emits a record whose message is the raw
|
||||||
|
access line; the fields we care about live on the record's positional
|
||||||
|
args. A small `logging.Filter` subclass in `logging_config.py` unpacks
|
||||||
|
those args and sets:
|
||||||
|
|
||||||
|
- `event = "http_request"`
|
||||||
|
- `method`, `path`, `status`, `client_ip`
|
||||||
|
- `duration_ms` (uvicorn doesn't expose this natively; computed via
|
||||||
|
the `extra` injected by a small middleware if straightforward,
|
||||||
|
otherwise deferred — the filter already gives Loki status + path,
|
||||||
|
which is the main thing)
|
||||||
|
|
||||||
|
If the duration cannot be obtained cheaply from uvicorn's access
|
||||||
|
record, landing the rest is still a win; the `duration_ms` field can
|
||||||
|
come in a follow-up without changing the log schema (it's an extra
|
||||||
|
field, not a label).
|
||||||
|
|
||||||
|
### Seed application events
|
||||||
|
|
||||||
|
Five events added as single-line `logger.info(..., extra={"event":
|
||||||
|
"..."})` calls at the matching code paths (names aligned with the
|
||||||
|
existing function names):
|
||||||
|
|
||||||
|
| Event | Site |
|
||||||
|
|---|---|
|
||||||
|
| `month_created` | `month_service.create_month` |
|
||||||
|
| `month_closed` | `month_service.close_month` |
|
||||||
|
| `template_entry_updated` | `service.update_entry` |
|
||||||
|
| `posting_added` | `month_service.add_posting` |
|
||||||
|
| `posting_deleted` | `month_service.delete_posting` |
|
||||||
|
|
||||||
|
One module-scoped logger at the top of each file that touches these
|
||||||
|
paths. No broader instrumentation in this change.
|
||||||
|
|
||||||
|
### Tests
|
||||||
|
|
||||||
|
`tests/test_logging.py`:
|
||||||
|
|
||||||
|
- Apply `LOG_CONFIG` via `logging.config.dictConfig`, emit a record
|
||||||
|
with `extra={"event": "smoke"}`, capture stdout via `capsys`,
|
||||||
|
`json.loads` the captured line, assert `level` / `event` /
|
||||||
|
`logger` / `message` / `timestamp` all present and correct.
|
||||||
|
- Feed a synthetic uvicorn access record through the filter, assert
|
||||||
|
resulting fields include `event="http_request"`, `method`, `path`,
|
||||||
|
`status`.
|
||||||
|
|
||||||
|
No end-to-end uvicorn-subprocess test. Formatter and filter
|
||||||
|
correctness at the handler level is enough for the launch contract.
|
||||||
|
|
||||||
|
### Dev flow
|
||||||
|
|
||||||
|
`uv run uvicorn quartermaster.main:app --log-config logconfig.yaml
|
||||||
|
--reload` — `--reload` keeps working. README gets a short "Logs"
|
||||||
|
section with two LogQL examples mirroring the Archon contract style.
|
||||||
|
|
||||||
|
## File additions / changes
|
||||||
|
|
||||||
|
New:
|
||||||
|
- `src/quartermaster/routes_health.py`
|
||||||
|
- `src/quartermaster/logging_config.py`
|
||||||
|
- `logconfig.yaml` (YAML shim for uvicorn CLI)
|
||||||
|
- `tests/test_health.py`
|
||||||
|
- `tests/test_logging.py`
|
||||||
|
|
||||||
|
Changed:
|
||||||
|
- `pyproject.toml` — add `python-json-logger`
|
||||||
|
- `src/quartermaster/main.py` — include the health router
|
||||||
|
- `src/quartermaster/service.py` — add one `logger.info` seed call
|
||||||
|
in `update_entry`
|
||||||
|
- `src/quartermaster/month_service.py` — add four `logger.info` seed
|
||||||
|
calls in `create_month`, `close_month`, `add_posting`,
|
||||||
|
`delete_posting`
|
||||||
|
- `README.md` — add the "Logs" section and mention `--log-config` in
|
||||||
|
the Run block
|
||||||
|
|
||||||
|
Not touched:
|
||||||
|
- Dockerfile / Compose: owned by later issues under #25.
|
||||||
|
- Alembic / DB layer: the healthcheck uses the existing session
|
||||||
|
factory; no migration.
|
||||||
|
|
||||||
|
## Order of work
|
||||||
|
|
||||||
|
Logging before healthz. Once `LOG_CONFIG` exists the healthz handler
|
||||||
|
can emit `event="healthz_check"` for free; the reverse order doesn't
|
||||||
|
give logging anything useful. Not load-bearing.
|
||||||
|
|
||||||
|
## Out of scope
|
||||||
|
|
||||||
|
- `/readyz` vs. `/livez` split — one endpoint covers this single-
|
||||||
|
container app.
|
||||||
|
- `/metrics` or any Prometheus exposition (5.2 in #25 is "not needed").
|
||||||
|
- Adding `structlog` (#27 explicitly excludes).
|
||||||
|
- Log-shipping configuration — Promtail on the host handles it.
|
||||||
|
- Broad app instrumentation beyond the five seed events.
|
||||||
Loading…
Reference in a new issue