parent
3b24d98b18
commit
ff15cb645f
1 changed files with 103 additions and 8 deletions
111
Operations.md
111
Operations.md
|
|
@ -92,10 +92,93 @@ rm -rf /tmp/qm-dev.db /tmp/qm-dev-backups
|
|||
|
||||
## Running in "production"
|
||||
|
||||
There is no prod. This is a local single-user app. Run the uvicorn
|
||||
dev server and reach it at http://127.0.0.1:8000. For automatic
|
||||
restart on crash, wrap the command in a `systemd --user` unit or a
|
||||
supervisor of your choice; the app itself does nothing special.
|
||||
Production is the homelab host home-ctr-onyx, containerised. Dev is
|
||||
the uvicorn reload server at `http://127.0.0.1:8000`. The platform
|
||||
contract ([PlatformContractQuartermaster](https://forgejo.labbity.unbiasedgeek.com/homelab/homelab-IaC/wiki/PlatformContractQuartermaster))
|
||||
is the authoritative record of the deploy surface; the sections below
|
||||
cover the app-side affordances that feed into it.
|
||||
|
||||
## Health
|
||||
|
||||
`GET /healthz` — unauthenticated, returns:
|
||||
|
||||
* `200 {"status":"ok"}` when a trivial `SELECT 1` through the
|
||||
SQLAlchemy session succeeds.
|
||||
* `503 {"status":"error","detail":"<ExceptionClassName>"}` on any
|
||||
exception from the DB probe. The error class name is the only
|
||||
detail leaked (no message, no traceback) — enough for an operator
|
||||
to see what tripped the check from a `curl` without log access.
|
||||
|
||||
No auth on purpose: the Docker `HEALTHCHECK` runs inside the container
|
||||
and cannot carry credentials, and Traefik's basic-auth middleware is
|
||||
not applied to this route. Kept on a dedicated router
|
||||
(`src/quartermaster/routes_health.py`) so any future router-scoped auth
|
||||
on the main routers leaves it alone.
|
||||
|
||||
A failed probe also emits a structured warning log (`event=healthz_failed`,
|
||||
`error_class=<cls>`) for Loki.
|
||||
|
||||
## Logs
|
||||
|
||||
Logs are JSON on stdout. The config lives at
|
||||
`src/quartermaster/logconfig.json` and is consumed both by Python (via
|
||||
the `LOG_CONFIG` dict loaded in `src/quartermaster/logging_config.py`)
|
||||
and by uvicorn CLI:
|
||||
|
||||
```sh
|
||||
uv run uvicorn quartermaster.main:app \
|
||||
--log-config src/quartermaster/logconfig.json \
|
||||
--reload
|
||||
```
|
||||
|
||||
Each log line has `level` and `event` as top-level JSON fields
|
||||
(Promtail on home-ctr-onyx extracts them as queryable Loki labels),
|
||||
plus arbitrary extras in the JSON body.
|
||||
|
||||
### Access logs
|
||||
|
||||
Uvicorn access records are enriched by `AccessLogFilter` into:
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "...", "level": "INFO", "logger": "uvicorn.access",
|
||||
"event": "http_request", "method": "GET", "path": "/healthz",
|
||||
"status": 200, "client_ip": "10.0.0.42:54321",
|
||||
"message": "... - \"GET /healthz HTTP/1.1\" 200"
|
||||
}
|
||||
```
|
||||
|
||||
### Application events
|
||||
|
||||
Five seed events fire at the most operationally interesting mutations:
|
||||
|
||||
| Event | Fires in | Extras |
|
||||
|---|---|---|
|
||||
| `month_created` | `month_service.create_month` | `year_month` |
|
||||
| `month_closed` | `month_service.close_month` | `year_month` |
|
||||
| `template_entry_updated` | `service.update_entry` | `entry_id` |
|
||||
| `posting_added` | `month_service.add_posting` | `posting_id`, `month_entry_id`, `amount` |
|
||||
| `posting_deleted` | `month_service.delete_posting` | `posting_id` |
|
||||
| `healthz_failed` | `routes_health.healthz` (WARNING) | `error_class` |
|
||||
|
||||
Additional events can be added the same way — `logger.info(msg,
|
||||
extra={"event": "...", ...})` on a logger under `quartermaster.*`.
|
||||
|
||||
### Example LogQL queries
|
||||
|
||||
Grafana Explore, Loki data source, once the deploy is live:
|
||||
|
||||
```
|
||||
{container="quartermaster"} | json
|
||||
{container="quartermaster", event="http_request", status=~"5.."}
|
||||
{container="quartermaster", event="month_closed"} | json | line_format "{{.year_month}} {{.message}}"
|
||||
```
|
||||
|
||||
### Dev ergonomics
|
||||
|
||||
Omit `--log-config src/quartermaster/logconfig.json` during local dev
|
||||
if you'd rather read logs in uvicorn's default human-readable format.
|
||||
Production must use the config so Promtail indexes properly.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
|
@ -109,8 +192,7 @@ time, which is expected.
|
|||
|
||||
You pulled code with a new column but did not apply the migration.
|
||||
Run `uv run alembic upgrade head`. The backup hook backs up the live
|
||||
DB first, then the migration adds the missing column. Common after
|
||||
a pull that includes schema work (notes field, month lifecycle).
|
||||
DB first, then the migration adds the missing column.
|
||||
|
||||
### Alembic reports a revision it cannot locate
|
||||
|
||||
|
|
@ -123,6 +205,18 @@ or downgrade the DB to a known-good revision before continuing.
|
|||
Intentional. The script is sqlite-specific. Switch the URL back to
|
||||
sqlite:///... or do the backup manually via your Postgres tooling.
|
||||
|
||||
### `/healthz` returns 503
|
||||
|
||||
Inspect the logged `event=healthz_failed` record in Loki or stdout.
|
||||
`error_class` names the exception type; common causes are a
|
||||
misconfigured `QUARTERMASTER_DB_URL`, a DB file that got wiped or
|
||||
permissions-corrupted, or Alembic having failed on container start.
|
||||
|
||||
### DeprecationWarning about `pythonjsonlogger.jsonlogger`
|
||||
|
||||
Fixed on main. The config now references `pythonjsonlogger.json`.
|
||||
If you see the warning, pull and re-run `uv sync`.
|
||||
|
||||
## Current schema
|
||||
|
||||
Applied migrations at time of writing:
|
||||
|
|
@ -135,5 +229,6 @@ Applied migrations at time of writing:
|
|||
| `a4ec4f8f6e9f` | add month lifecycle columns (`state`, `activated_at`, `closed_at`) |
|
||||
| `cc60e7f73a1c` | add `posting` ledger table, seed opening-balance postings, drop `month_entry.applied` |
|
||||
|
||||
After pulling new code, `uv run alembic upgrade head` walks the chain
|
||||
and the backup hook fires between each hop.
|
||||
No schema change between `cc60e7f73a1c` and HEAD. After pulling new
|
||||
code, `uv run alembic upgrade head` walks the chain and the backup
|
||||
hook fires between each hop.
|
||||
|
|
|
|||
Loading…
Reference in a new issue