Add User Guide for operators
Detailed end-user documentation distinct from the Development Guide. Covers installation (make/venv/docker), configuration, every CLI subcommand (ask/replay/costs), depth presets, output interpretation, operational logging, file layout, troubleshooting, and FAQ. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
parent
ded78ff1ce
commit
4163e67c0b
1 changed files with 421 additions and 0 deletions
421
UserGuide.md
Normal file
421
UserGuide.md
Normal file
|
|
@ -0,0 +1,421 @@
|
|||
# User Guide
|
||||
|
||||
This guide is for **operators using Marchwarden** to ask research questions, replay traces, and track costs. If you're contributing code, see the [Development Guide](DevelopmentGuide) instead.
|
||||
|
||||
---
|
||||
|
||||
## Table of contents
|
||||
|
||||
1. [What Marchwarden is](#what-marchwarden-is)
|
||||
2. [Installation](#installation)
|
||||
3. [Configuration](#configuration)
|
||||
4. [Asking a question — `marchwarden ask`](#asking-a-question)
|
||||
5. [Reading the output](#reading-the-output)
|
||||
6. [Replaying a trace — `marchwarden replay`](#replaying-a-trace)
|
||||
7. [Tracking spend — `marchwarden costs`](#tracking-spend)
|
||||
8. [Depth presets](#depth-presets)
|
||||
9. [Operational logging](#operational-logging)
|
||||
10. [File layout under `~/.marchwarden/`](#file-layout)
|
||||
11. [Running in Docker](#running-in-docker)
|
||||
12. [Troubleshooting](#troubleshooting)
|
||||
13. [FAQ](#faq)
|
||||
|
||||
---
|
||||
|
||||
## What Marchwarden is
|
||||
|
||||
Marchwarden is an agentic web research assistant. You give it a question; it plans search queries, fetches the most promising sources, synthesizes a grounded answer with inline citations, and reports the gaps it could not resolve. Each call returns a structured **ResearchResult** containing:
|
||||
|
||||
- **answer** — multi-paragraph synthesis with inline source references
|
||||
- **citations** — list of sources with raw verbatim excerpts (no rewriting)
|
||||
- **gaps** — what the agent could not resolve, categorized
|
||||
- **discovery_events** — lateral findings worth investigating with other tools
|
||||
- **open_questions** — follow-up questions the agent generated
|
||||
- **confidence** + factors — auditable score, not just a number
|
||||
- **cost_metadata** — tokens, iterations, wall-clock time, model id
|
||||
- **trace_id** — UUID linking to a per-call audit log
|
||||
|
||||
Every research call is recorded three ways:
|
||||
- a JSONL **trace** (per-step audit log) at `~/.marchwarden/traces/<trace_id>.jsonl`
|
||||
- a one-line **cost ledger** entry at `~/.marchwarden/costs.jsonl`
|
||||
- structured **operational logs** to stderr (and optionally a rotating file)
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
### Option 1 — Make + venv (recommended for local use)
|
||||
|
||||
```bash
|
||||
git clone https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden.git
|
||||
cd marchwarden
|
||||
make install
|
||||
source .venv/bin/activate
|
||||
```
|
||||
|
||||
`make install` creates `.venv/`, installs the project editable with dev extras, and wires the `marchwarden` command. After activation, `which marchwarden` should resolve to `.venv/bin/marchwarden`.
|
||||
|
||||
If `marchwarden` reports `ModuleNotFoundError: No module named 'cli'`, you have a stale install on your `$PATH`:
|
||||
```bash
|
||||
which -a marchwarden # find the stale copy
|
||||
rm <path/to/stale> # remove it
|
||||
hash -r # clear bash's command cache
|
||||
```
|
||||
|
||||
### Option 2 — Manual venv
|
||||
|
||||
```bash
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -e ".[dev]"
|
||||
```
|
||||
|
||||
### Option 3 — Docker
|
||||
|
||||
```bash
|
||||
make docker-build
|
||||
./scripts/docker-test.sh ask "your question here"
|
||||
```
|
||||
|
||||
The docker flow mounts `~/secrets` (read-only) and `~/.marchwarden/` (read-write) into the container, so traces, costs, and logs land in your real home directory the same as a venv install.
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### API keys (required)
|
||||
|
||||
Marchwarden reads two keys from `~/secrets` (a shell-style `KEY=value` file):
|
||||
|
||||
```
|
||||
ANTHROPIC_API_KEY=sk-ant-...
|
||||
TAVILY_API_KEY=tvly-...
|
||||
```
|
||||
|
||||
Get them from:
|
||||
- Anthropic: https://console.anthropic.com
|
||||
- Tavily: https://tavily.com (free tier: 1,000 searches/month)
|
||||
|
||||
### Environment variables (optional)
|
||||
|
||||
| Variable | Purpose | Default |
|
||||
|---|---|---|
|
||||
| `MARCHWARDEN_MODEL` | Anthropic model id used by the researcher | `claude-sonnet-4-6` |
|
||||
| `MARCHWARDEN_LOG_LEVEL` | `DEBUG` / `INFO` / `WARNING` / `ERROR` | `INFO` |
|
||||
| `MARCHWARDEN_LOG_FORMAT` | `json` (OpenSearch-ready) or `console` (colored) | auto: `console` if stderr is a TTY, `json` otherwise |
|
||||
| `MARCHWARDEN_LOG_FILE` | Set to `1` to also log to `~/.marchwarden/logs/marchwarden.log` (10MB rotation, 5 backups) | unset |
|
||||
| `MARCHWARDEN_COST_LEDGER` | Override cost ledger path | `~/.marchwarden/costs.jsonl` |
|
||||
|
||||
### Price table
|
||||
|
||||
`~/.marchwarden/prices.toml` is auto-created on first run with current Anthropic + Tavily rates. Edit it manually when upstream prices change — Marchwarden does not auto-fetch. Unknown models record `estimated_cost_usd: null` rather than crash.
|
||||
|
||||
---
|
||||
|
||||
## Asking a question
|
||||
|
||||
```bash
|
||||
marchwarden ask "What are ideal crops for a garden in Utah?"
|
||||
```
|
||||
|
||||
### Flags
|
||||
|
||||
| Flag | Purpose |
|
||||
|---|---|
|
||||
| `--depth shallow\|balanced\|deep` | Pick a research depth preset (default: `balanced`) |
|
||||
| `--budget INT` | Override the depth's token budget |
|
||||
| `--max-iterations INT` | Override the depth's iteration cap |
|
||||
|
||||
`--budget` and `--max-iterations` always win over the depth preset. If both are unset, the depth preset chooses.
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
# Quick lookup — shallow depth (2 iterations, 5k tokens, 5 sources)
|
||||
marchwarden ask "What is the capital of Utah?" --depth shallow
|
||||
|
||||
# Default — balanced depth (5 iterations, 20k tokens, 10 sources)
|
||||
marchwarden ask "Compare cool-season and warm-season crops for Utah"
|
||||
|
||||
# Thorough — deep depth (8 iterations, 60k tokens, 20 sources)
|
||||
marchwarden ask "Compare AWS Lambda vs Azure Functions for HFT" --depth deep
|
||||
|
||||
# Override the depth preset for one call
|
||||
marchwarden ask "..." --depth balanced --budget 50000
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Reading the output
|
||||
|
||||
Output is rendered with [rich](https://github.com/Textualize/rich). Each section is a panel or table:
|
||||
|
||||
### Answer panel
|
||||
The synthesized answer in prose. Source numbers like `[Source 4]` map to entries in the Citations table.
|
||||
|
||||
### Citations table
|
||||
| Column | Meaning |
|
||||
|---|---|
|
||||
| `#` | Source index (matches `[Source N]` in the answer) |
|
||||
| `Title / Locator` | Page title plus the URL |
|
||||
| `Excerpt` | **Verbatim** text from the source (up to 500 chars). This bypasses the synthesizer to prevent quiet rewriting |
|
||||
| `Conf` | Researcher's confidence in this source's accuracy (0.00–1.00) |
|
||||
|
||||
If the answer contains a claim, you can read the matching `Excerpt` to verify the source actually says what the synthesizer claims it says.
|
||||
|
||||
### Gaps table
|
||||
Categorized reasons the agent couldn't fully resolve the question:
|
||||
- `source_not_found` — no relevant pages indexed
|
||||
- `access_denied` — sources existed but couldn't be fetched
|
||||
- `budget_exhausted` — ran out of iterations / tokens
|
||||
- `contradictory_sources` — sources disagreed and the disagreement wasn't resolvable
|
||||
- `scope_exceeded` — the question reaches into a domain web search can't answer (academic papers, internal databases, legal docs)
|
||||
|
||||
### Discovery Events table
|
||||
Lateral findings: things the agent stumbled across that aren't in the answer but might matter for follow-up. Each suggests a `target researcher` and a query — these are how a future PI orchestrator (V2) will dispatch other specialists.
|
||||
|
||||
### Open Questions table
|
||||
Forward-looking questions the agent generated mid-research. Each has a priority (`high`/`medium`/`low`) and the source context that prompted it. These often reveal the *next* useful question to ask.
|
||||
|
||||
### Confidence panel
|
||||
| Field | Meaning |
|
||||
|---|---|
|
||||
| `Overall` | 0.00–1.00. Read this in the context of the factors below, not in isolation |
|
||||
| `Corroborating sources` | How many sources agree on the core claims |
|
||||
| `Source authority` | `high` (.gov/.edu/peer-reviewed), `medium` (established orgs), `low` (blogs/forums) |
|
||||
| `Contradiction detected` | Did sources disagree? |
|
||||
| `Query specificity match` | How well the results addressed the actual question (0.00–1.00) |
|
||||
| `Budget status` | `spent` (the loop hit its cap before voluntarily stopping) or `under cap` |
|
||||
| `Recency` | `current` (<1y) / `recent` (1–3y) / `dated` (>3y) / `unknown` |
|
||||
|
||||
**`Budget status: spent` is normal, not an error.** It means the agent used the cap you gave it before deciding it was done. Pair this with `Overall: 0.88+` for a confident answer that fully spent its budget.
|
||||
|
||||
### Cost panel
|
||||
`Tokens`, `Iterations`, `Wall time`, `Model`. The token total includes the synthesis call, which is uncapped by design (see [Depth presets](#depth-presets) below).
|
||||
|
||||
### Trace footer
|
||||
The `trace_id` is a UUID. Save it if you'll want to replay this run later.
|
||||
|
||||
---
|
||||
|
||||
## Replaying a trace
|
||||
|
||||
Every research call writes a JSONL audit log at `~/.marchwarden/traces/<trace_id>.jsonl`. Replay it with:
|
||||
|
||||
```bash
|
||||
marchwarden replay <trace_id>
|
||||
```
|
||||
|
||||
The replay table shows every step the agent took: planning calls, search queries, URL fetches with content hashes, synthesis attempts, and the final outcome. Use it to:
|
||||
|
||||
- **Diagnose unexpected results** — see exactly what queries the agent ran and what it found
|
||||
- **Audit citations** — every fetch records a SHA-256 content hash so you can verify the same page hasn't changed since
|
||||
- **Debug synthesis failures** — `synthesis_error` steps record the LLM's full raw response and parse error
|
||||
|
||||
### Flags
|
||||
|
||||
| Flag | Purpose |
|
||||
|---|---|
|
||||
| `--trace-dir PATH` | Override default trace directory (`~/.marchwarden/traces`) |
|
||||
|
||||
---
|
||||
|
||||
## Tracking spend
|
||||
|
||||
Every research call appends one line to `~/.marchwarden/costs.jsonl` with model, tokens (input/output split), Tavily search count, and an estimated cost in USD. Inspect it with:
|
||||
|
||||
```bash
|
||||
marchwarden costs
|
||||
```
|
||||
|
||||
### Output sections
|
||||
|
||||
- **Cost Summary** — total calls, total spend, total tokens (with input/output split), Tavily searches, and a warning if any calls used a model not in your price table
|
||||
- **Per Day** — calls / tokens / spend grouped by day
|
||||
- **Per Model** — calls / tokens / spend grouped by `model_id`
|
||||
- **Highest-Cost Call** — the most expensive single run, with `trace_id` for follow-up
|
||||
|
||||
### Flags
|
||||
|
||||
| Flag | Purpose |
|
||||
|---|---|
|
||||
| `--since DATE` | ISO date (`2026-04-01`) or relative (`7d`, `24h`, `2w`, `1m`) |
|
||||
| `--until DATE` | Same |
|
||||
| `--model MODEL_ID` | Filter to a single model |
|
||||
| `--json` | Emit raw filtered ledger entries (one JSON per line) instead of the table |
|
||||
| `--ledger PATH` | Override default ledger location |
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
marchwarden costs # all-time summary
|
||||
marchwarden costs --since 7d # last 7 days
|
||||
marchwarden costs --model claude-opus-4-6
|
||||
marchwarden costs --since 2026-04-01 --until 2026-04-08 --json
|
||||
```
|
||||
|
||||
The `--json` mode is suitable for piping into `jq` or shipping to a billing/analytics tool.
|
||||
|
||||
---
|
||||
|
||||
## Depth presets
|
||||
|
||||
The `--depth` flag picks sensible defaults for the agent loop. Explicit `--budget` and `--max-iterations` always override.
|
||||
|
||||
| Depth | max_iterations | token_budget | max_sources | Use for |
|
||||
|---|---:|---:|---:|---|
|
||||
| `shallow` | 2 | 5,000 | 5 | quick lookups, factual Q&A |
|
||||
| `balanced` | 5 | 20,000 | 10 | default, most questions |
|
||||
| `deep` | 8 | 60,000 | 20 | comparison studies, complex investigations |
|
||||
|
||||
### How the budget is enforced
|
||||
|
||||
The token budget is a **soft cap on the tool-use loop only**:
|
||||
- Before each new iteration, the agent checks `tokens_used >= token_budget`. If yes, the loop stops and synthesis runs on whatever evidence is gathered.
|
||||
- The synthesis call itself is **uncapped** — it always completes, so you get a real ResearchResult instead of a parse-failure stub.
|
||||
- This means total tokens reported in the Cost panel and ledger will normally exceed `token_budget` by the synthesis cost (~10–25k tokens depending on evidence size).
|
||||
|
||||
Practical implications:
|
||||
- A `balanced` run with `token_budget=20000` typically reports `tokens_used: 30000–50000` total. That's normal.
|
||||
- If you need *strict* total spend control, use `shallow` and hand-tune `--budget` low.
|
||||
- If you need *thorough* answers, use `deep` and accept that the call may consume 100k+ tokens.
|
||||
|
||||
---
|
||||
|
||||
## Operational logging
|
||||
|
||||
Marchwarden logs every research step via `structlog`. Logs go to **stderr** so they don't interfere with the research output on stdout.
|
||||
|
||||
### Log levels
|
||||
|
||||
- **`INFO`** (default) — milestones only (~9 lines per call): research start, each iteration boundary, synthesis start/complete, completion, cost recording
|
||||
- **`DEBUG`** — every step (~13+ lines per call): adds individual `web_search`, `fetch_url`, and tool-result events
|
||||
|
||||
### Formats
|
||||
|
||||
- **`console`** — colored, human-readable; auto-selected when stderr is a TTY
|
||||
- **`json`** — newline-delimited JSON, OpenSearch-ready; auto-selected when stderr is not a TTY (e.g., in CI, containers, or piped output)
|
||||
|
||||
Set explicitly with `MARCHWARDEN_LOG_FORMAT=json` or `=console`.
|
||||
|
||||
### Persistent file logging
|
||||
|
||||
```bash
|
||||
MARCHWARDEN_LOG_FILE=1 marchwarden ask "..."
|
||||
```
|
||||
|
||||
Logs are appended to `~/.marchwarden/logs/marchwarden.log` (10MB per file, 5 rotated backups). The format respects `MARCHWARDEN_LOG_FORMAT`.
|
||||
|
||||
### Context binding
|
||||
|
||||
Every log line emitted during a research call automatically carries:
|
||||
- `trace_id` — the same UUID you see in the Trace footer
|
||||
- `researcher` — currently always `web` (the researcher type)
|
||||
|
||||
This means in OpenSearch (or any structured log viewer) you can filter to a single research call with one query: `trace_id:"abc-123-..."`.
|
||||
|
||||
---
|
||||
|
||||
## File layout
|
||||
|
||||
Marchwarden writes to `~/.marchwarden/` exclusively. Nothing else on disk is touched.
|
||||
|
||||
```
|
||||
~/.marchwarden/
|
||||
├── prices.toml # auto-seeded price table; edit when rates change
|
||||
├── costs.jsonl # cost ledger, one line per research call
|
||||
├── traces/
|
||||
│ └── <trace_id>.jsonl # per-call audit log, one file per call
|
||||
└── logs/
|
||||
└── marchwarden.log # only if MARCHWARDEN_LOG_FILE=1
|
||||
└── marchwarden.log.{1..5} # rotated backups
|
||||
```
|
||||
|
||||
All files are append-only or rewritten safely; you can `tail -f`, `jq`, or back them up freely.
|
||||
|
||||
---
|
||||
|
||||
## Running in Docker
|
||||
|
||||
The same workflows work inside the docker test image — useful for sandboxed runs or to avoid touching the host's Python:
|
||||
|
||||
```bash
|
||||
make docker-build # one-time
|
||||
./scripts/docker-test.sh ask "your question" --depth deep
|
||||
./scripts/docker-test.sh replay <trace_id>
|
||||
```
|
||||
|
||||
The `ask` and `replay` subcommands of `docker-test.sh` mount:
|
||||
- `~/secrets:/root/secrets:ro` — your API keys
|
||||
- `~/.marchwarden:/root/.marchwarden` — traces, costs, logs persist back to the host
|
||||
|
||||
The script also forwards `MARCHWARDEN_MODEL` from the host environment if set.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### `marchwarden: command not found` after `make install`
|
||||
|
||||
Either:
|
||||
1. The venv isn't activated. Run `source .venv/bin/activate`, or use `make ask` which calls `.venv/bin/marchwarden` directly.
|
||||
2. A stale install exists at `~/.local/bin/marchwarden`. Run `which -a marchwarden`, delete the stale copy, then `hash -r`.
|
||||
|
||||
### `ModuleNotFoundError: No module named 'cli'`
|
||||
|
||||
The `marchwarden` script being run is from a stale install (e.g., a previous `pip install --user` or pipx install) that doesn't know about the current source layout. Same fix as above.
|
||||
|
||||
### `Error: HTTP 404 Not Found` on the Anthropic API
|
||||
|
||||
Your `MARCHWARDEN_MODEL` is set to a model id that doesn't exist. Check `claude-sonnet-4-6` or `claude-opus-4-6`. The default is `claude-sonnet-4-6`.
|
||||
|
||||
### `Calls with unknown model price: N` warning in `marchwarden costs`
|
||||
|
||||
You ran a research call with a `model_id` not present in `~/.marchwarden/prices.toml`. Add a section for it:
|
||||
```toml
|
||||
[models."your-model-id"]
|
||||
input_per_mtok_usd = 3.00
|
||||
output_per_mtok_usd = 15.00
|
||||
```
|
||||
Then re-run `marchwarden costs`. Existing ledger entries with `null` cost won't be retroactively fixed; future calls will pick up the new prices.
|
||||
|
||||
### `Budget status: spent` on every run
|
||||
|
||||
This is *expected*, not an error. See [Reading the output → Confidence panel](#reading-the-output) and [Depth presets → How the budget is enforced](#depth-presets) for details.
|
||||
|
||||
### Synthesis fallback ("Research completed but synthesis failed")
|
||||
|
||||
This used to happen when the synthesis JSON exceeded its `max_tokens` cap, but was fixed in PR #20. If you still see it, file an issue with the `trace_id` — the JSONL trace will contain the exact `synthesis_error` step including the model's raw response and parse error.
|
||||
|
||||
### The `marchwarden ask` output is paginated / cut off
|
||||
|
||||
`rich` defaults to your terminal width. If lines are wrapping ugly, widen your terminal or pipe to `less -R` to see colors:
|
||||
```bash
|
||||
marchwarden ask "..." 2>&1 | less -R
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
**How long does a research call take?**
|
||||
Typical wall-clock times: shallow ~15s, balanced ~30–60s, deep ~60–120s. Mostly LLM latency, not network.
|
||||
|
||||
**How much does a call cost?**
|
||||
At current Sonnet 4.6 rates: shallow ~$0.02, balanced ~$0.05–$0.15, deep ~$0.20–$0.60. Run `marchwarden costs` after a few calls to see your actual numbers.
|
||||
|
||||
**Can I use a different model?**
|
||||
Yes. `MARCHWARDEN_MODEL=claude-opus-4-6 marchwarden ask "..."` will use Opus instead of Sonnet. Make sure the model id is in your `prices.toml` so the cost ledger can estimate spend.
|
||||
|
||||
**Can the agent access local files / databases?**
|
||||
Not yet. V1 is web-search only. V2+ (per the [Roadmap](Roadmap)) will add file/document and database researchers — same contract, different tools.
|
||||
|
||||
**Does the agent learn between calls?**
|
||||
No. Each `research()` call is stateless. The trace logs and cost ledger accumulate over time, but the agent itself starts fresh every time. Cross-call learning is on the V2+ roadmap.
|
||||
|
||||
**Where do I report bugs?**
|
||||
Open an issue at the [Forgejo repo](https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden/issues). Include the `trace_id` from the Trace footer — it lets us reconstruct exactly what happened.
|
||||
|
||||
---
|
||||
|
||||
See also: [Architecture](Architecture), [Research Contract](ResearchContract), [Development Guide](DevelopmentGuide), [Roadmap](Roadmap)
|
||||
Loading…
Reference in a new issue