parent
3b24d98b18
commit
ff15cb645f
1 changed files with 103 additions and 8 deletions
111
Operations.md
111
Operations.md
|
|
@ -92,10 +92,93 @@ rm -rf /tmp/qm-dev.db /tmp/qm-dev-backups
|
||||||
|
|
||||||
## Running in "production"
|
## Running in "production"
|
||||||
|
|
||||||
There is no prod. This is a local single-user app. Run the uvicorn
|
Production is the homelab host home-ctr-onyx, containerised. Dev is
|
||||||
dev server and reach it at http://127.0.0.1:8000. For automatic
|
the uvicorn reload server at `http://127.0.0.1:8000`. The platform
|
||||||
restart on crash, wrap the command in a `systemd --user` unit or a
|
contract ([PlatformContractQuartermaster](https://forgejo.labbity.unbiasedgeek.com/homelab/homelab-IaC/wiki/PlatformContractQuartermaster))
|
||||||
supervisor of your choice; the app itself does nothing special.
|
is the authoritative record of the deploy surface; the sections below
|
||||||
|
cover the app-side affordances that feed into it.
|
||||||
|
|
||||||
|
## Health
|
||||||
|
|
||||||
|
`GET /healthz` — unauthenticated, returns:
|
||||||
|
|
||||||
|
* `200 {"status":"ok"}` when a trivial `SELECT 1` through the
|
||||||
|
SQLAlchemy session succeeds.
|
||||||
|
* `503 {"status":"error","detail":"<ExceptionClassName>"}` on any
|
||||||
|
exception from the DB probe. The error class name is the only
|
||||||
|
detail leaked (no message, no traceback) — enough for an operator
|
||||||
|
to see what tripped the check from a `curl` without log access.
|
||||||
|
|
||||||
|
No auth on purpose: the Docker `HEALTHCHECK` runs inside the container
|
||||||
|
and cannot carry credentials, and Traefik's basic-auth middleware is
|
||||||
|
not applied to this route. Kept on a dedicated router
|
||||||
|
(`src/quartermaster/routes_health.py`) so any future router-scoped auth
|
||||||
|
on the main routers leaves it alone.
|
||||||
|
|
||||||
|
A failed probe also emits a structured warning log (`event=healthz_failed`,
|
||||||
|
`error_class=<cls>`) for Loki.
|
||||||
|
|
||||||
|
## Logs
|
||||||
|
|
||||||
|
Logs are JSON on stdout. The config lives at
|
||||||
|
`src/quartermaster/logconfig.json` and is consumed both by Python (via
|
||||||
|
the `LOG_CONFIG` dict loaded in `src/quartermaster/logging_config.py`)
|
||||||
|
and by uvicorn CLI:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
uv run uvicorn quartermaster.main:app \
|
||||||
|
--log-config src/quartermaster/logconfig.json \
|
||||||
|
--reload
|
||||||
|
```
|
||||||
|
|
||||||
|
Each log line has `level` and `event` as top-level JSON fields
|
||||||
|
(Promtail on home-ctr-onyx extracts them as queryable Loki labels),
|
||||||
|
plus arbitrary extras in the JSON body.
|
||||||
|
|
||||||
|
### Access logs
|
||||||
|
|
||||||
|
Uvicorn access records are enriched by `AccessLogFilter` into:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"timestamp": "...", "level": "INFO", "logger": "uvicorn.access",
|
||||||
|
"event": "http_request", "method": "GET", "path": "/healthz",
|
||||||
|
"status": 200, "client_ip": "10.0.0.42:54321",
|
||||||
|
"message": "... - \"GET /healthz HTTP/1.1\" 200"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Application events
|
||||||
|
|
||||||
|
Five seed events fire at the most operationally interesting mutations:
|
||||||
|
|
||||||
|
| Event | Fires in | Extras |
|
||||||
|
|---|---|---|
|
||||||
|
| `month_created` | `month_service.create_month` | `year_month` |
|
||||||
|
| `month_closed` | `month_service.close_month` | `year_month` |
|
||||||
|
| `template_entry_updated` | `service.update_entry` | `entry_id` |
|
||||||
|
| `posting_added` | `month_service.add_posting` | `posting_id`, `month_entry_id`, `amount` |
|
||||||
|
| `posting_deleted` | `month_service.delete_posting` | `posting_id` |
|
||||||
|
| `healthz_failed` | `routes_health.healthz` (WARNING) | `error_class` |
|
||||||
|
|
||||||
|
Additional events can be added the same way — `logger.info(msg,
|
||||||
|
extra={"event": "...", ...})` on a logger under `quartermaster.*`.
|
||||||
|
|
||||||
|
### Example LogQL queries
|
||||||
|
|
||||||
|
Grafana Explore, Loki data source, once the deploy is live:
|
||||||
|
|
||||||
|
```
|
||||||
|
{container="quartermaster"} | json
|
||||||
|
{container="quartermaster", event="http_request", status=~"5.."}
|
||||||
|
{container="quartermaster", event="month_closed"} | json | line_format "{{.year_month}} {{.message}}"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dev ergonomics
|
||||||
|
|
||||||
|
Omit `--log-config src/quartermaster/logconfig.json` during local dev
|
||||||
|
if you'd rather read logs in uvicorn's default human-readable format.
|
||||||
|
Production must use the config so Promtail indexes properly.
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
|
|
@ -109,8 +192,7 @@ time, which is expected.
|
||||||
|
|
||||||
You pulled code with a new column but did not apply the migration.
|
You pulled code with a new column but did not apply the migration.
|
||||||
Run `uv run alembic upgrade head`. The backup hook backs up the live
|
Run `uv run alembic upgrade head`. The backup hook backs up the live
|
||||||
DB first, then the migration adds the missing column. Common after
|
DB first, then the migration adds the missing column.
|
||||||
a pull that includes schema work (notes field, month lifecycle).
|
|
||||||
|
|
||||||
### Alembic reports a revision it cannot locate
|
### Alembic reports a revision it cannot locate
|
||||||
|
|
||||||
|
|
@ -123,6 +205,18 @@ or downgrade the DB to a known-good revision before continuing.
|
||||||
Intentional. The script is sqlite-specific. Switch the URL back to
|
Intentional. The script is sqlite-specific. Switch the URL back to
|
||||||
sqlite:///... or do the backup manually via your Postgres tooling.
|
sqlite:///... or do the backup manually via your Postgres tooling.
|
||||||
|
|
||||||
|
### `/healthz` returns 503
|
||||||
|
|
||||||
|
Inspect the logged `event=healthz_failed` record in Loki or stdout.
|
||||||
|
`error_class` names the exception type; common causes are a
|
||||||
|
misconfigured `QUARTERMASTER_DB_URL`, a DB file that got wiped or
|
||||||
|
permissions-corrupted, or Alembic having failed on container start.
|
||||||
|
|
||||||
|
### DeprecationWarning about `pythonjsonlogger.jsonlogger`
|
||||||
|
|
||||||
|
Fixed on main. The config now references `pythonjsonlogger.json`.
|
||||||
|
If you see the warning, pull and re-run `uv sync`.
|
||||||
|
|
||||||
## Current schema
|
## Current schema
|
||||||
|
|
||||||
Applied migrations at time of writing:
|
Applied migrations at time of writing:
|
||||||
|
|
@ -135,5 +229,6 @@ Applied migrations at time of writing:
|
||||||
| `a4ec4f8f6e9f` | add month lifecycle columns (`state`, `activated_at`, `closed_at`) |
|
| `a4ec4f8f6e9f` | add month lifecycle columns (`state`, `activated_at`, `closed_at`) |
|
||||||
| `cc60e7f73a1c` | add `posting` ledger table, seed opening-balance postings, drop `month_entry.applied` |
|
| `cc60e7f73a1c` | add `posting` ledger table, seed opening-balance postings, drop `month_entry.applied` |
|
||||||
|
|
||||||
After pulling new code, `uv run alembic upgrade head` walks the chain
|
No schema change between `cc60e7f73a1c` and HEAD. After pulling new
|
||||||
and the backup hook fires between each hop.
|
code, `uv run alembic upgrade head` walks the chain and the backup
|
||||||
|
hook fires between each hop.
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue