Add User Guide for operators

Detailed end-user documentation distinct from the Development Guide. Covers installation (make/venv/docker), configuration, every CLI subcommand (ask/replay/costs), depth presets, output interpretation, operational logging, file layout, troubleshooting, and FAQ. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:42:11 -06:00 · 2026-04-08 16:42:11 -06:00 · 4163e67c0b
commit 4163e67c0b
parent ded78ff1ce
1 changed files with 421 additions and 0 deletions
--- a/UserGuide.md
+++ b/UserGuide.md
@ -0,0 +1,421 @@
+# User Guide
+
+This guide is for **operators using Marchwarden** to ask research questions, replay traces, and track costs. If you're contributing code, see the [Development Guide](DevelopmentGuide) instead.
+
+---
+
+## Table of contents
+
+1. [What Marchwarden is](#what-marchwarden-is)
+2. [Installation](#installation)
+3. [Configuration](#configuration)
+4. [Asking a question — `marchwarden ask`](#asking-a-question)
+5. [Reading the output](#reading-the-output)
+6. [Replaying a trace — `marchwarden replay`](#replaying-a-trace)
+7. [Tracking spend — `marchwarden costs`](#tracking-spend)
+8. [Depth presets](#depth-presets)
+9. [Operational logging](#operational-logging)
+10. [File layout under `~/.marchwarden/`](#file-layout)
+11. [Running in Docker](#running-in-docker)
+12. [Troubleshooting](#troubleshooting)
+13. [FAQ](#faq)
+
+---
+
+## What Marchwarden is
+
+Marchwarden is an agentic web research assistant. You give it a question; it plans search queries, fetches the most promising sources, synthesizes a grounded answer with inline citations, and reports the gaps it could not resolve. Each call returns a structured **ResearchResult** containing:
+
+- **answer** — multi-paragraph synthesis with inline source references
+- **citations** — list of sources with raw verbatim excerpts (no rewriting)
+- **gaps** — what the agent could not resolve, categorized
+- **discovery_events** — lateral findings worth investigating with other tools
+- **open_questions** — follow-up questions the agent generated
+- **confidence** + factors — auditable score, not just a number
+- **cost_metadata** — tokens, iterations, wall-clock time, model id
+- **trace_id** — UUID linking to a per-call audit log
+
+Every research call is recorded three ways:
+- a JSONL **trace** (per-step audit log) at `~/.marchwarden/traces/<trace_id>.jsonl`
+- a one-line **cost ledger** entry at `~/.marchwarden/costs.jsonl`
+- structured **operational logs** to stderr (and optionally a rotating file)
+
+---
+
+## Installation
+
+### Option 1 — Make + venv (recommended for local use)
+
+```bash
+git clone https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden.git
+cd marchwarden
+make install
+source .venv/bin/activate
+```
+
+`make install` creates `.venv/`, installs the project editable with dev extras, and wires the `marchwarden` command. After activation, `which marchwarden` should resolve to `.venv/bin/marchwarden`.
+
+If `marchwarden` reports `ModuleNotFoundError: No module named 'cli'`, you have a stale install on your `$PATH`:
+```bash
+which -a marchwarden     # find the stale copy
+rm <path/to/stale>       # remove it
+hash -r                  # clear bash's command cache
+```
+
+### Option 2 — Manual venv
+
+```bash
+python3 -m venv .venv
+source .venv/bin/activate
+pip install -e ".[dev]"
+```
+
+### Option 3 — Docker
+
+```bash
+make docker-build
+./scripts/docker-test.sh ask "your question here"
+```
+
+The docker flow mounts `~/secrets` (read-only) and `~/.marchwarden/` (read-write) into the container, so traces, costs, and logs land in your real home directory the same as a venv install.
+
+---
+
+## Configuration
+
+### API keys (required)
+
+Marchwarden reads two keys from `~/secrets` (a shell-style `KEY=value` file):
+
+```
+ANTHROPIC_API_KEY=sk-ant-...
+TAVILY_API_KEY=tvly-...
+```
+
+Get them from:
+- Anthropic: https://console.anthropic.com
+- Tavily: https://tavily.com (free tier: 1,000 searches/month)
+
+### Environment variables (optional)
+
+| Variable | Purpose | Default |
+|---|---|---|
+| `MARCHWARDEN_MODEL` | Anthropic model id used by the researcher | `claude-sonnet-4-6` |
+| `MARCHWARDEN_LOG_LEVEL` | `DEBUG` / `INFO` / `WARNING` / `ERROR` | `INFO` |
+| `MARCHWARDEN_LOG_FORMAT` | `json` (OpenSearch-ready) or `console` (colored) | auto: `console` if stderr is a TTY, `json` otherwise |
+| `MARCHWARDEN_LOG_FILE` | Set to `1` to also log to `~/.marchwarden/logs/marchwarden.log` (10MB rotation, 5 backups) | unset |
+| `MARCHWARDEN_COST_LEDGER` | Override cost ledger path | `~/.marchwarden/costs.jsonl` |
+
+### Price table
+
+`~/.marchwarden/prices.toml` is auto-created on first run with current Anthropic + Tavily rates. Edit it manually when upstream prices change — Marchwarden does not auto-fetch. Unknown models record `estimated_cost_usd: null` rather than crash.
+
+---
+
+## Asking a question
+
+```bash
+marchwarden ask "What are ideal crops for a garden in Utah?"
+```
+
+### Flags
+
+| Flag | Purpose |
+|---|---|
+| `--depth shallow\|balanced\|deep` | Pick a research depth preset (default: `balanced`) |
+| `--budget INT` | Override the depth's token budget |
+| `--max-iterations INT` | Override the depth's iteration cap |
+
+`--budget` and `--max-iterations` always win over the depth preset. If both are unset, the depth preset chooses.
+
+### Examples
+
+```bash
+# Quick lookup — shallow depth (2 iterations, 5k tokens, 5 sources)
+marchwarden ask "What is the capital of Utah?" --depth shallow
+
+# Default — balanced depth (5 iterations, 20k tokens, 10 sources)
+marchwarden ask "Compare cool-season and warm-season crops for Utah"
+
+# Thorough — deep depth (8 iterations, 60k tokens, 20 sources)
+marchwarden ask "Compare AWS Lambda vs Azure Functions for HFT" --depth deep
+
+# Override the depth preset for one call
+marchwarden ask "..." --depth balanced --budget 50000
+```
+
+---
+
+## Reading the output
+
+Output is rendered with [rich](https://github.com/Textualize/rich). Each section is a panel or table:
+
+### Answer panel
+The synthesized answer in prose. Source numbers like `[Source 4]` map to entries in the Citations table.
+
+### Citations table
+| Column | Meaning |
+|---|---|
+| `#` | Source index (matches `[Source N]` in the answer) |
+| `Title / Locator` | Page title plus the URL |
+| `Excerpt` | **Verbatim** text from the source (up to 500 chars). This bypasses the synthesizer to prevent quiet rewriting |
+| `Conf` | Researcher's confidence in this source's accuracy (0.00–1.00) |
+
+If the answer contains a claim, you can read the matching `Excerpt` to verify the source actually says what the synthesizer claims it says.
+
+### Gaps table
+Categorized reasons the agent couldn't fully resolve the question:
+- `source_not_found` — no relevant pages indexed
+- `access_denied` — sources existed but couldn't be fetched
+- `budget_exhausted` — ran out of iterations / tokens
+- `contradictory_sources` — sources disagreed and the disagreement wasn't resolvable
+- `scope_exceeded` — the question reaches into a domain web search can't answer (academic papers, internal databases, legal docs)
+
+### Discovery Events table
+Lateral findings: things the agent stumbled across that aren't in the answer but might matter for follow-up. Each suggests a `target researcher` and a query — these are how a future PI orchestrator (V2) will dispatch other specialists.
+
+### Open Questions table
+Forward-looking questions the agent generated mid-research. Each has a priority (`high`/`medium`/`low`) and the source context that prompted it. These often reveal the *next* useful question to ask.
+
+### Confidence panel
+| Field | Meaning |
+|---|---|
+| `Overall` | 0.00–1.00. Read this in the context of the factors below, not in isolation |
+| `Corroborating sources` | How many sources agree on the core claims |
+| `Source authority` | `high` (.gov/.edu/peer-reviewed), `medium` (established orgs), `low` (blogs/forums) |
+| `Contradiction detected` | Did sources disagree? |
+| `Query specificity match` | How well the results addressed the actual question (0.00–1.00) |
+| `Budget status` | `spent` (the loop hit its cap before voluntarily stopping) or `under cap` |
+| `Recency` | `current` (<1y) / `recent` (1–3y) / `dated` (>3y) / `unknown` |
+
+**`Budget status: spent` is normal, not an error.** It means the agent used the cap you gave it before deciding it was done. Pair this with `Overall: 0.88+` for a confident answer that fully spent its budget.
+
+### Cost panel
+`Tokens`, `Iterations`, `Wall time`, `Model`. The token total includes the synthesis call, which is uncapped by design (see [Depth presets](#depth-presets) below).
+
+### Trace footer
+The `trace_id` is a UUID. Save it if you'll want to replay this run later.
+
+---
+
+## Replaying a trace
+
+Every research call writes a JSONL audit log at `~/.marchwarden/traces/<trace_id>.jsonl`. Replay it with:
+
+```bash
+marchwarden replay <trace_id>
+```
+
+The replay table shows every step the agent took: planning calls, search queries, URL fetches with content hashes, synthesis attempts, and the final outcome. Use it to:
+
+- **Diagnose unexpected results** — see exactly what queries the agent ran and what it found
+- **Audit citations** — every fetch records a SHA-256 content hash so you can verify the same page hasn't changed since
+- **Debug synthesis failures** — `synthesis_error` steps record the LLM's full raw response and parse error
+
+### Flags
+
+| Flag | Purpose |
+|---|---|
+| `--trace-dir PATH` | Override default trace directory (`~/.marchwarden/traces`) |
+
+---
+
+## Tracking spend
+
+Every research call appends one line to `~/.marchwarden/costs.jsonl` with model, tokens (input/output split), Tavily search count, and an estimated cost in USD. Inspect it with:
+
+```bash
+marchwarden costs
+```
+
+### Output sections
+
+- **Cost Summary** — total calls, total spend, total tokens (with input/output split), Tavily searches, and a warning if any calls used a model not in your price table
+- **Per Day** — calls / tokens / spend grouped by day
+- **Per Model** — calls / tokens / spend grouped by `model_id`
+- **Highest-Cost Call** — the most expensive single run, with `trace_id` for follow-up
+
+### Flags
+
+| Flag | Purpose |
+|---|---|
+| `--since DATE` | ISO date (`2026-04-01`) or relative (`7d`, `24h`, `2w`, `1m`) |
+| `--until DATE` | Same |
+| `--model MODEL_ID` | Filter to a single model |
+| `--json` | Emit raw filtered ledger entries (one JSON per line) instead of the table |
+| `--ledger PATH` | Override default ledger location |
+
+### Examples
+
+```bash
+marchwarden costs                        # all-time summary
+marchwarden costs --since 7d             # last 7 days
+marchwarden costs --model claude-opus-4-6
+marchwarden costs --since 2026-04-01 --until 2026-04-08 --json
+```
+
+The `--json` mode is suitable for piping into `jq` or shipping to a billing/analytics tool.
+
+---
+
+## Depth presets
+
+The `--depth` flag picks sensible defaults for the agent loop. Explicit `--budget` and `--max-iterations` always override.
+
+| Depth | max_iterations | token_budget | max_sources | Use for |
+|---|---:|---:|---:|---|
+| `shallow`  | 2 |  5,000 |  5 | quick lookups, factual Q&A |
+| `balanced` | 5 | 20,000 | 10 | default, most questions |
+| `deep`     | 8 | 60,000 | 20 | comparison studies, complex investigations |
+
+### How the budget is enforced
+
+The token budget is a **soft cap on the tool-use loop only**:
+- Before each new iteration, the agent checks `tokens_used >= token_budget`. If yes, the loop stops and synthesis runs on whatever evidence is gathered.
+- The synthesis call itself is **uncapped** — it always completes, so you get a real ResearchResult instead of a parse-failure stub.
+- This means total tokens reported in the Cost panel and ledger will normally exceed `token_budget` by the synthesis cost (~10–25k tokens depending on evidence size).
+
+Practical implications:
+- A `balanced` run with `token_budget=20000` typically reports `tokens_used: 30000–50000` total. That's normal.
+- If you need *strict* total spend control, use `shallow` and hand-tune `--budget` low.
+- If you need *thorough* answers, use `deep` and accept that the call may consume 100k+ tokens.
+
+---
+
+## Operational logging
+
+Marchwarden logs every research step via `structlog`. Logs go to **stderr** so they don't interfere with the research output on stdout.
+
+### Log levels
+
+- **`INFO`** (default) — milestones only (~9 lines per call): research start, each iteration boundary, synthesis start/complete, completion, cost recording
+- **`DEBUG`** — every step (~13+ lines per call): adds individual `web_search`, `fetch_url`, and tool-result events
+
+### Formats
+
+- **`console`** — colored, human-readable; auto-selected when stderr is a TTY
+- **`json`** — newline-delimited JSON, OpenSearch-ready; auto-selected when stderr is not a TTY (e.g., in CI, containers, or piped output)
+
+Set explicitly with `MARCHWARDEN_LOG_FORMAT=json` or `=console`.
+
+### Persistent file logging
+
+```bash
+MARCHWARDEN_LOG_FILE=1 marchwarden ask "..."
+```
+
+Logs are appended to `~/.marchwarden/logs/marchwarden.log` (10MB per file, 5 rotated backups). The format respects `MARCHWARDEN_LOG_FORMAT`.
+
+### Context binding
+
+Every log line emitted during a research call automatically carries:
+- `trace_id` — the same UUID you see in the Trace footer
+- `researcher` — currently always `web` (the researcher type)
+
+This means in OpenSearch (or any structured log viewer) you can filter to a single research call with one query: `trace_id:"abc-123-..."`.
+
+---
+
+## File layout
+
+Marchwarden writes to `~/.marchwarden/` exclusively. Nothing else on disk is touched.
+
+```
+~/.marchwarden/
+├── prices.toml                      # auto-seeded price table; edit when rates change
+├── costs.jsonl                      # cost ledger, one line per research call
+├── traces/
+│   └── <trace_id>.jsonl             # per-call audit log, one file per call
+└── logs/
+    └── marchwarden.log              # only if MARCHWARDEN_LOG_FILE=1
+    └── marchwarden.log.{1..5}       # rotated backups
+```
+
+All files are append-only or rewritten safely; you can `tail -f`, `jq`, or back them up freely.
+
+---
+
+## Running in Docker
+
+The same workflows work inside the docker test image — useful for sandboxed runs or to avoid touching the host's Python:
+
+```bash
+make docker-build                                         # one-time
+./scripts/docker-test.sh ask "your question" --depth deep
+./scripts/docker-test.sh replay <trace_id>
+```
+
+The `ask` and `replay` subcommands of `docker-test.sh` mount:
+- `~/secrets:/root/secrets:ro` — your API keys
+- `~/.marchwarden:/root/.marchwarden` — traces, costs, logs persist back to the host
+
+The script also forwards `MARCHWARDEN_MODEL` from the host environment if set.
+
+---
+
+## Troubleshooting
+
+### `marchwarden: command not found` after `make install`
+
+Either:
+1. The venv isn't activated. Run `source .venv/bin/activate`, or use `make ask` which calls `.venv/bin/marchwarden` directly.
+2. A stale install exists at `~/.local/bin/marchwarden`. Run `which -a marchwarden`, delete the stale copy, then `hash -r`.
+
+### `ModuleNotFoundError: No module named 'cli'`
+
+The `marchwarden` script being run is from a stale install (e.g., a previous `pip install --user` or pipx install) that doesn't know about the current source layout. Same fix as above.
+
+### `Error: HTTP 404 Not Found` on the Anthropic API
+
+Your `MARCHWARDEN_MODEL` is set to a model id that doesn't exist. Check `claude-sonnet-4-6` or `claude-opus-4-6`. The default is `claude-sonnet-4-6`.
+
+### `Calls with unknown model price: N` warning in `marchwarden costs`
+
+You ran a research call with a `model_id` not present in `~/.marchwarden/prices.toml`. Add a section for it:
+```toml
+[models."your-model-id"]
+input_per_mtok_usd = 3.00
+output_per_mtok_usd = 15.00
+```
+Then re-run `marchwarden costs`. Existing ledger entries with `null` cost won't be retroactively fixed; future calls will pick up the new prices.
+
+### `Budget status: spent` on every run
+
+This is *expected*, not an error. See [Reading the output → Confidence panel](#reading-the-output) and [Depth presets → How the budget is enforced](#depth-presets) for details.
+
+### Synthesis fallback ("Research completed but synthesis failed")
+
+This used to happen when the synthesis JSON exceeded its `max_tokens` cap, but was fixed in PR #20. If you still see it, file an issue with the `trace_id` — the JSONL trace will contain the exact `synthesis_error` step including the model's raw response and parse error.
+
+### The `marchwarden ask` output is paginated / cut off
+
+`rich` defaults to your terminal width. If lines are wrapping ugly, widen your terminal or pipe to `less -R` to see colors:
+```bash
+marchwarden ask "..." 2>&1 | less -R
+```
+
+---
+
+## FAQ
+
+**How long does a research call take?**
+Typical wall-clock times: shallow ~15s, balanced ~30–60s, deep ~60–120s. Mostly LLM latency, not network.
+
+**How much does a call cost?**
+At current Sonnet 4.6 rates: shallow ~$0.02, balanced ~$0.05–$0.15, deep ~$0.20–$0.60. Run `marchwarden costs` after a few calls to see your actual numbers.
+
+**Can I use a different model?**
+Yes. `MARCHWARDEN_MODEL=claude-opus-4-6 marchwarden ask "..."` will use Opus instead of Sonnet. Make sure the model id is in your `prices.toml` so the cost ledger can estimate spend.
+
+**Can the agent access local files / databases?**
+Not yet. V1 is web-search only. V2+ (per the [Roadmap](Roadmap)) will add file/document and database researchers — same contract, different tools.
+
+**Does the agent learn between calls?**
+No. Each `research()` call is stateless. The trace logs and cost ledger accumulate over time, but the agent itself starts fresh every time. Cross-call learning is on the V2+ roadmap.
+
+**Where do I report bugs?**
+Open an issue at the [Forgejo repo](https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden/issues). Include the `trace_id` from the Trace footer — it lets us reconstruct exactly what happened.
+
+---
+
+See also: [Architecture](Architecture), [Research Contract](ResearchContract), [Development Guide](DevelopmentGuide), [Roadmap](Roadmap)