Part of the platform-contract intake (#25). Covers both pieces of work that must land before first deploy to home-ctr-onyx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
3b4b34a84c
commit
88e68ea2f9
1 changed files with 174 additions and 0 deletions
|
|
@ -0,0 +1,174 @@
|
|||
# Healthcheck endpoint and structured JSON logs
|
||||
|
||||
**Date:** 2026-04-19
|
||||
**Issues:** #26 (healthz), #27 (JSON logs), part of #25 (platform contract intake)
|
||||
|
||||
## Background
|
||||
|
||||
The homelab platform contract for Quartermaster (#25) requires two
|
||||
things the codebase does not have today:
|
||||
|
||||
1. A Docker `HEALTHCHECK` so `container_health_status` is visible to
|
||||
cAdvisor/Prometheus, which in turn drives the container-down alert
|
||||
planned at launch. That requires an in-app endpoint to target.
|
||||
2. Structured JSON logs on stdout with `level` and `event` fields so
|
||||
Promtail indexes them as Loki labels.
|
||||
|
||||
Both block the first deploy to `home-ctr-onyx`. This spec covers both
|
||||
so the work can land as one coherent change.
|
||||
|
||||
## /healthz
|
||||
|
||||
### Endpoint
|
||||
|
||||
`GET /healthz`, unauthenticated.
|
||||
|
||||
- Success: `200 {"status": "ok"}`.
|
||||
- Failure: `503 {"status": "error", "detail": "<exception class name>"}`.
|
||||
The class name goes in so operators can tell from the response body
|
||||
what tripped the check; no traceback or message is leaked.
|
||||
|
||||
The check opens a session via the standard `SessionLocal` factory,
|
||||
runs `SELECT 1`, and closes. Any exception surfaces as a 503.
|
||||
|
||||
### Placement
|
||||
|
||||
New module `src/quartermaster/routes_health.py` with its own
|
||||
`APIRouter`, included from `main.create_app()` alongside the existing
|
||||
routers. Keeping it on a dedicated router means any future middleware
|
||||
(basic-auth, rate-limit bypass) applied to the main routers can leave
|
||||
`/healthz` alone — the Docker healthcheck runs inside the container
|
||||
and must not need credentials.
|
||||
|
||||
### Tests
|
||||
|
||||
`tests/test_health.py`:
|
||||
|
||||
- Success: FastAPI `TestClient` hits `/healthz`, asserts 200 and
|
||||
`{"status": "ok"}`.
|
||||
- Failure: monkey-patch the session factory to raise on `.execute()`,
|
||||
assert 503 and `{"status": "error", "detail": "<class-name>"}`.
|
||||
|
||||
## Structured JSON logs
|
||||
|
||||
### Dependency
|
||||
|
||||
Add `python-json-logger` to `[project].dependencies` in
|
||||
`pyproject.toml`. One small, single-purpose dep; no transitive
|
||||
surprises. `structlog` is explicitly out of scope (#27).
|
||||
|
||||
### Config module
|
||||
|
||||
New `src/quartermaster/logging_config.py` exposing `LOG_CONFIG`, a
|
||||
`logging.config.dictConfig`-compatible dict:
|
||||
|
||||
- One formatter using `pythonjsonlogger.jsonlogger.JsonFormatter`
|
||||
emitting `timestamp` (ISO-8601 UTC), `level`, `event`, `logger`,
|
||||
`message`. `extra={...}` kwargs passed to logger calls flatten into
|
||||
the JSON body.
|
||||
- One handler writing to `sys.stdout`.
|
||||
- Loggers: the root app logger and `uvicorn.access` both route
|
||||
through the JSON handler. `uvicorn.error` also gets the handler so
|
||||
startup / shutdown lines are captured in the same format.
|
||||
|
||||
A Python dict (rather than YAML) is the source of truth because
|
||||
tests can import it and apply `dictConfig` in-process. The uvicorn
|
||||
CLI consumes it via a small `logconfig.yaml` shim at repo root that
|
||||
references the dict module.
|
||||
|
||||
### Access log filter
|
||||
|
||||
Uvicorn's access logger emits a record whose message is the raw
|
||||
access line; the fields we care about live on the record's positional
|
||||
args. A small `logging.Filter` subclass in `logging_config.py` unpacks
|
||||
those args and sets:
|
||||
|
||||
- `event = "http_request"`
|
||||
- `method`, `path`, `status`, `client_ip`
|
||||
- `duration_ms` (uvicorn doesn't expose this natively; computed via
|
||||
the `extra` injected by a small middleware if straightforward,
|
||||
otherwise deferred — the filter already gives Loki status + path,
|
||||
which is the main thing)
|
||||
|
||||
If the duration cannot be obtained cheaply from uvicorn's access
|
||||
record, landing the rest is still a win; the `duration_ms` field can
|
||||
come in a follow-up without changing the log schema (it's an extra
|
||||
field, not a label).
|
||||
|
||||
### Seed application events
|
||||
|
||||
Five events added as single-line `logger.info(..., extra={"event":
|
||||
"..."})` calls at the matching code paths (names aligned with the
|
||||
existing function names):
|
||||
|
||||
| Event | Site |
|
||||
|---|---|
|
||||
| `month_created` | `month_service.create_month` |
|
||||
| `month_closed` | `month_service.close_month` |
|
||||
| `template_entry_updated` | `service.update_entry` |
|
||||
| `posting_added` | `month_service.add_posting` |
|
||||
| `posting_deleted` | `month_service.delete_posting` |
|
||||
|
||||
One module-scoped logger at the top of each file that touches these
|
||||
paths. No broader instrumentation in this change.
|
||||
|
||||
### Tests
|
||||
|
||||
`tests/test_logging.py`:
|
||||
|
||||
- Apply `LOG_CONFIG` via `logging.config.dictConfig`, emit a record
|
||||
with `extra={"event": "smoke"}`, capture stdout via `capsys`,
|
||||
`json.loads` the captured line, assert `level` / `event` /
|
||||
`logger` / `message` / `timestamp` all present and correct.
|
||||
- Feed a synthetic uvicorn access record through the filter, assert
|
||||
resulting fields include `event="http_request"`, `method`, `path`,
|
||||
`status`.
|
||||
|
||||
No end-to-end uvicorn-subprocess test. Formatter and filter
|
||||
correctness at the handler level is enough for the launch contract.
|
||||
|
||||
### Dev flow
|
||||
|
||||
`uv run uvicorn quartermaster.main:app --log-config logconfig.yaml
|
||||
--reload` — `--reload` keeps working. README gets a short "Logs"
|
||||
section with two LogQL examples mirroring the Archon contract style.
|
||||
|
||||
## File additions / changes
|
||||
|
||||
New:
|
||||
- `src/quartermaster/routes_health.py`
|
||||
- `src/quartermaster/logging_config.py`
|
||||
- `logconfig.yaml` (YAML shim for uvicorn CLI)
|
||||
- `tests/test_health.py`
|
||||
- `tests/test_logging.py`
|
||||
|
||||
Changed:
|
||||
- `pyproject.toml` — add `python-json-logger`
|
||||
- `src/quartermaster/main.py` — include the health router
|
||||
- `src/quartermaster/service.py` — add one `logger.info` seed call
|
||||
in `update_entry`
|
||||
- `src/quartermaster/month_service.py` — add four `logger.info` seed
|
||||
calls in `create_month`, `close_month`, `add_posting`,
|
||||
`delete_posting`
|
||||
- `README.md` — add the "Logs" section and mention `--log-config` in
|
||||
the Run block
|
||||
|
||||
Not touched:
|
||||
- Dockerfile / Compose: owned by later issues under #25.
|
||||
- Alembic / DB layer: the healthcheck uses the existing session
|
||||
factory; no migration.
|
||||
|
||||
## Order of work
|
||||
|
||||
Logging before healthz. Once `LOG_CONFIG` exists the healthz handler
|
||||
can emit `event="healthz_check"` for free; the reverse order doesn't
|
||||
give logging anything useful. Not load-bearing.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- `/readyz` vs. `/livez` split — one endpoint covers this single-
|
||||
container app.
|
||||
- `/metrics` or any Prometheus exposition (5.2 in #25 is "not needed").
|
||||
- Adding `structlog` (#27 explicitly excludes).
|
||||
- Log-shipping configuration — Promtail on the host handles it.
|
||||
- Broad app instrumentation beyond the five seed events.
|
||||
Loading…
Reference in a new issue