docs(operations): add Logs and Health sections for #26, #27

claude-code 2026-04-19 12:30:10 -06:00
parent 3b24d98b18
commit ff15cb645f

@ -92,10 +92,93 @@ rm -rf /tmp/qm-dev.db /tmp/qm-dev-backups
## Running in "production"
There is no prod. This is a local single-user app. Run the uvicorn
dev server and reach it at http://127.0.0.1:8000. For automatic
restart on crash, wrap the command in a `systemd --user` unit or a
supervisor of your choice; the app itself does nothing special.
Production is the homelab host home-ctr-onyx, containerised. Dev is
the uvicorn reload server at `http://127.0.0.1:8000`. The platform
contract ([PlatformContractQuartermaster](https://forgejo.labbity.unbiasedgeek.com/homelab/homelab-IaC/wiki/PlatformContractQuartermaster))
is the authoritative record of the deploy surface; the sections below
cover the app-side affordances that feed into it.
## Health
`GET /healthz` — unauthenticated, returns:
* `200 {"status":"ok"}` when a trivial `SELECT 1` through the
SQLAlchemy session succeeds.
* `503 {"status":"error","detail":"<ExceptionClassName>"}` on any
exception from the DB probe. The error class name is the only
detail leaked (no message, no traceback) — enough for an operator
to see what tripped the check from a `curl` without log access.
No auth on purpose: the Docker `HEALTHCHECK` runs inside the container
and cannot carry credentials, and Traefik's basic-auth middleware is
not applied to this route. Kept on a dedicated router
(`src/quartermaster/routes_health.py`) so any future router-scoped auth
on the main routers leaves it alone.
A failed probe also emits a structured warning log (`event=healthz_failed`,
`error_class=<cls>`) for Loki.
## Logs
Logs are JSON on stdout. The config lives at
`src/quartermaster/logconfig.json` and is consumed both by Python (via
the `LOG_CONFIG` dict loaded in `src/quartermaster/logging_config.py`)
and by uvicorn CLI:
```sh
uv run uvicorn quartermaster.main:app \
--log-config src/quartermaster/logconfig.json \
--reload
```
Each log line has `level` and `event` as top-level JSON fields
(Promtail on home-ctr-onyx extracts them as queryable Loki labels),
plus arbitrary extras in the JSON body.
### Access logs
Uvicorn access records are enriched by `AccessLogFilter` into:
```json
{
"timestamp": "...", "level": "INFO", "logger": "uvicorn.access",
"event": "http_request", "method": "GET", "path": "/healthz",
"status": 200, "client_ip": "10.0.0.42:54321",
"message": "... - \"GET /healthz HTTP/1.1\" 200"
}
```
### Application events
Five seed events fire at the most operationally interesting mutations:
| Event | Fires in | Extras |
|---|---|---|
| `month_created` | `month_service.create_month` | `year_month` |
| `month_closed` | `month_service.close_month` | `year_month` |
| `template_entry_updated` | `service.update_entry` | `entry_id` |
| `posting_added` | `month_service.add_posting` | `posting_id`, `month_entry_id`, `amount` |
| `posting_deleted` | `month_service.delete_posting` | `posting_id` |
| `healthz_failed` | `routes_health.healthz` (WARNING) | `error_class` |
Additional events can be added the same way — `logger.info(msg,
extra={"event": "...", ...})` on a logger under `quartermaster.*`.
### Example LogQL queries
Grafana Explore, Loki data source, once the deploy is live:
```
{container="quartermaster"} | json
{container="quartermaster", event="http_request", status=~"5.."}
{container="quartermaster", event="month_closed"} | json | line_format "{{.year_month}} {{.message}}"
```
### Dev ergonomics
Omit `--log-config src/quartermaster/logconfig.json` during local dev
if you'd rather read logs in uvicorn's default human-readable format.
Production must use the config so Promtail indexes properly.
## Troubleshooting
@ -109,8 +192,7 @@ time, which is expected.
You pulled code with a new column but did not apply the migration.
Run `uv run alembic upgrade head`. The backup hook backs up the live
DB first, then the migration adds the missing column. Common after
a pull that includes schema work (notes field, month lifecycle).
DB first, then the migration adds the missing column.
### Alembic reports a revision it cannot locate
@ -123,6 +205,18 @@ or downgrade the DB to a known-good revision before continuing.
Intentional. The script is sqlite-specific. Switch the URL back to
sqlite:///... or do the backup manually via your Postgres tooling.
### `/healthz` returns 503
Inspect the logged `event=healthz_failed` record in Loki or stdout.
`error_class` names the exception type; common causes are a
misconfigured `QUARTERMASTER_DB_URL`, a DB file that got wiped or
permissions-corrupted, or Alembic having failed on container start.
### DeprecationWarning about `pythonjsonlogger.jsonlogger`
Fixed on main. The config now references `pythonjsonlogger.json`.
If you see the warning, pull and re-run `uv sync`.
## Current schema
Applied migrations at time of writing:
@ -135,5 +229,6 @@ Applied migrations at time of writing:
| `a4ec4f8f6e9f` | add month lifecycle columns (`state`, `activated_at`, `closed_at`) |
| `cc60e7f73a1c` | add `posting` ledger table, seed opening-balance postings, drop `month_entry.applied` |
After pulling new code, `uv run alembic upgrade head` walks the chain
and the backup hook fires between each hop.
No schema change between `cc60e7f73a1c` and HEAD. After pulling new
code, `uv run alembic upgrade head` walks the chain and the backup
hook fires between each hop.