Table of Contents
- User Guide
- Table of contents
- What Marchwarden is
- Installation
- Configuration
- Asking a question
- Reading the output
- Answer panel
- Citations table
- Gaps table
- Discovery Events table
- Open Questions table
- Confidence panel
- Cost panel
- Trace footer
- Replaying a trace
- Tracking spend
- Depth presets
- Operational logging
- File layout
- Running in Docker
- Troubleshooting
- marchwarden: command not found after make install
- ModuleNotFoundError: No module named 'cli'
- Error: HTTP 404 Not Found on the Anthropic API
- Calls with unknown model price: N warning in marchwarden costs
- Budget status: spent on every run
- Synthesis fallback ("Research completed but synthesis failed")
- The marchwarden ask output is paginated / cut off
- FAQ
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
User Guide
This guide is for operators using Marchwarden to ask research questions, replay traces, and track costs. If you're contributing code, see the Development Guide instead.
Table of contents
- What Marchwarden is
- Installation
- Configuration
- Asking a question —
marchwarden ask - Reading the output
- Replaying a trace —
marchwarden replay - Tracking spend —
marchwarden costs - Depth presets
- Operational logging
- File layout under
~/.marchwarden/ - Running in Docker
- Troubleshooting
- FAQ
What Marchwarden is
Marchwarden is an agentic web research assistant. You give it a question; it plans search queries, fetches the most promising sources, synthesizes a grounded answer with inline citations, and reports the gaps it could not resolve. Each call returns a structured ResearchResult containing:
- answer — multi-paragraph synthesis with inline source references
- citations — list of sources with raw verbatim excerpts (no rewriting)
- gaps — what the agent could not resolve, categorized
- discovery_events — lateral findings worth investigating with other tools
- open_questions — follow-up questions the agent generated
- confidence + factors — auditable score, not just a number
- cost_metadata — tokens, iterations, wall-clock time, model id
- trace_id — UUID linking to a per-call audit log
Every research call is recorded three ways:
- a JSONL trace (per-step audit log) at
~/.marchwarden/traces/<trace_id>.jsonl - a one-line cost ledger entry at
~/.marchwarden/costs.jsonl - structured operational logs to stderr (and optionally a rotating file)
Installation
Option 1 — Make + venv (recommended for local use)
git clone https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden.git
cd marchwarden
make install
source .venv/bin/activate
make install creates .venv/, installs the project editable with dev extras, and wires the marchwarden command. After activation, which marchwarden should resolve to .venv/bin/marchwarden.
If marchwarden reports ModuleNotFoundError: No module named 'cli', you have a stale install on your $PATH:
which -a marchwarden # find the stale copy
rm <path/to/stale> # remove it
hash -r # clear bash's command cache
Option 2 — Manual venv
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
Option 3 — Docker
make docker-build
./scripts/docker-test.sh ask "your question here"
The docker flow mounts ~/secrets (read-only) and ~/.marchwarden/ (read-write) into the container, so traces, costs, and logs land in your real home directory the same as a venv install.
Configuration
API keys (required)
Marchwarden reads two keys from ~/secrets (a shell-style KEY=value file):
ANTHROPIC_API_KEY=sk-ant-...
TAVILY_API_KEY=tvly-...
Get them from:
- Anthropic: https://console.anthropic.com
- Tavily: https://tavily.com (free tier: 1,000 searches/month)
Environment variables (optional)
| Variable | Purpose | Default |
|---|---|---|
MARCHWARDEN_MODEL |
Anthropic model id used by the researcher | claude-sonnet-4-6 |
MARCHWARDEN_LOG_LEVEL |
DEBUG / INFO / WARNING / ERROR |
INFO |
MARCHWARDEN_LOG_FORMAT |
json (OpenSearch-ready) or console (colored) |
auto: console if stderr is a TTY, json otherwise |
MARCHWARDEN_LOG_FILE |
Set to 1 to also log to ~/.marchwarden/logs/marchwarden.log (10MB rotation, 5 backups) |
unset |
MARCHWARDEN_COST_LEDGER |
Override cost ledger path | ~/.marchwarden/costs.jsonl |
Price table
~/.marchwarden/prices.toml is auto-created on first run with current Anthropic + Tavily rates. Edit it manually when upstream prices change — Marchwarden does not auto-fetch. Unknown models record estimated_cost_usd: null rather than crash.
Asking a question
marchwarden ask "What are ideal crops for a garden in Utah?"
Flags
| Flag | Purpose |
|---|---|
--depth shallow|balanced|deep |
Pick a research depth preset (default: balanced) |
--budget INT |
Override the depth's token budget |
--max-iterations INT |
Override the depth's iteration cap |
--budget and --max-iterations always win over the depth preset. If both are unset, the depth preset chooses.
Examples
# Quick lookup — shallow depth (2 iterations, 5k tokens, 5 sources)
marchwarden ask "What is the capital of Utah?" --depth shallow
# Default — balanced depth (5 iterations, 20k tokens, 10 sources)
marchwarden ask "Compare cool-season and warm-season crops for Utah"
# Thorough — deep depth (8 iterations, 60k tokens, 20 sources)
marchwarden ask "Compare AWS Lambda vs Azure Functions for HFT" --depth deep
# Override the depth preset for one call
marchwarden ask "..." --depth balanced --budget 50000
Reading the output
Output is rendered with rich. Each section is a panel or table:
Answer panel
The synthesized answer in prose. Source numbers like [Source 4] map to entries in the Citations table.
Citations table
| Column | Meaning |
|---|---|
# |
Source index (matches [Source N] in the answer) |
Title / Locator |
Page title plus the URL |
Excerpt |
Verbatim text from the source (up to 500 chars). This bypasses the synthesizer to prevent quiet rewriting |
Conf |
Researcher's confidence in this source's accuracy (0.00–1.00) |
If the answer contains a claim, you can read the matching Excerpt to verify the source actually says what the synthesizer claims it says.
Gaps table
Categorized reasons the agent couldn't fully resolve the question:
source_not_found— no relevant pages indexedaccess_denied— sources existed but couldn't be fetchedbudget_exhausted— ran out of iterations / tokenscontradictory_sources— sources disagreed and the disagreement wasn't resolvablescope_exceeded— the question reaches into a domain web search can't answer (academic papers, internal databases, legal docs)
Discovery Events table
Lateral findings: things the agent stumbled across that aren't in the answer but might matter for follow-up. Each suggests a target researcher and a query — these are how a future PI orchestrator (V2) will dispatch other specialists.
Open Questions table
Forward-looking questions the agent generated mid-research. Each has a priority (high/medium/low) and the source context that prompted it. These often reveal the next useful question to ask.
Confidence panel
| Field | Meaning |
|---|---|
Overall |
0.00–1.00. Read this in the context of the factors below, not in isolation |
Corroborating sources |
How many sources agree on the core claims |
Source authority |
high (.gov/.edu/peer-reviewed), medium (established orgs), low (blogs/forums) |
Contradiction detected |
Did sources disagree? |
Query specificity match |
How well the results addressed the actual question (0.00–1.00) |
Budget status |
spent (the loop hit its cap before voluntarily stopping) or under cap |
Recency |
current (<1y) / recent (1–3y) / dated (>3y) / unknown |
Budget status: spent is normal, not an error. It means the agent used the cap you gave it before deciding it was done. Pair this with Overall: 0.88+ for a confident answer that fully spent its budget.
Cost panel
Tokens, Iterations, Wall time, Model. The token total includes the synthesis call, which is uncapped by design (see Depth presets below).
Trace footer
The trace_id is a UUID. Save it if you'll want to replay this run later.
Replaying a trace
Every research call writes a JSONL audit log at ~/.marchwarden/traces/<trace_id>.jsonl. Replay it with:
marchwarden replay <trace_id>
The replay table shows every step the agent took: planning calls, search queries, URL fetches with content hashes, synthesis attempts, and the final outcome. Use it to:
- Diagnose unexpected results — see exactly what queries the agent ran and what it found
- Audit citations — every fetch records a SHA-256 content hash so you can verify the same page hasn't changed since
- Debug synthesis failures —
synthesis_errorsteps record the LLM's full raw response and parse error
Flags
| Flag | Purpose |
|---|---|
--trace-dir PATH |
Override default trace directory (~/.marchwarden/traces) |
Tracking spend
Every research call appends one line to ~/.marchwarden/costs.jsonl with model, tokens (input/output split), Tavily search count, and an estimated cost in USD. Inspect it with:
marchwarden costs
Output sections
- Cost Summary — total calls, total spend, total tokens (with input/output split), Tavily searches, and a warning if any calls used a model not in your price table
- Per Day — calls / tokens / spend grouped by day
- Per Model — calls / tokens / spend grouped by
model_id - Highest-Cost Call — the most expensive single run, with
trace_idfor follow-up
Flags
| Flag | Purpose |
|---|---|
--since DATE |
ISO date (2026-04-01) or relative (7d, 24h, 2w, 1m) |
--until DATE |
Same |
--model MODEL_ID |
Filter to a single model |
--json |
Emit raw filtered ledger entries (one JSON per line) instead of the table |
--ledger PATH |
Override default ledger location |
Examples
marchwarden costs # all-time summary
marchwarden costs --since 7d # last 7 days
marchwarden costs --model claude-opus-4-6
marchwarden costs --since 2026-04-01 --until 2026-04-08 --json
The --json mode is suitable for piping into jq or shipping to a billing/analytics tool.
Depth presets
The --depth flag picks sensible defaults for the agent loop. Explicit --budget and --max-iterations always override.
| Depth | max_iterations | token_budget | max_sources | Use for |
|---|---|---|---|---|
shallow |
2 | 5,000 | 5 | quick lookups, factual Q&A |
balanced |
5 | 20,000 | 10 | default, most questions |
deep |
8 | 60,000 | 20 | comparison studies, complex investigations |
How the budget is enforced
The token budget is a soft cap on the tool-use loop only:
- Before each new iteration, the agent checks
tokens_used >= token_budget. If yes, the loop stops and synthesis runs on whatever evidence is gathered. - The synthesis call itself is uncapped — it always completes, so you get a real ResearchResult instead of a parse-failure stub.
- This means total tokens reported in the Cost panel and ledger will normally exceed
token_budgetby the synthesis cost (~10–25k tokens depending on evidence size).
Practical implications:
- A
balancedrun withtoken_budget=20000typically reportstokens_used: 30000–50000total. That's normal. - If you need strict total spend control, use
shallowand hand-tune--budgetlow. - If you need thorough answers, use
deepand accept that the call may consume 100k+ tokens.
Operational logging
Marchwarden logs every research step via structlog. Logs go to stderr so they don't interfere with the research output on stdout.
Log levels
INFO(default) — milestones only (~9 lines per call): research start, each iteration boundary, synthesis start/complete, completion, cost recordingDEBUG— every step (~13+ lines per call): adds individualweb_search,fetch_url, and tool-result events
Formats
console— colored, human-readable; auto-selected when stderr is a TTYjson— newline-delimited JSON, OpenSearch-ready; auto-selected when stderr is not a TTY (e.g., in CI, containers, or piped output)
Set explicitly with MARCHWARDEN_LOG_FORMAT=json or =console.
Persistent file logging
MARCHWARDEN_LOG_FILE=1 marchwarden ask "..."
Logs are appended to ~/.marchwarden/logs/marchwarden.log (10MB per file, 5 rotated backups). The format respects MARCHWARDEN_LOG_FORMAT.
Context binding
Every log line emitted during a research call automatically carries:
trace_id— the same UUID you see in the Trace footerresearcher— currently alwaysweb(the researcher type)
This means in OpenSearch (or any structured log viewer) you can filter to a single research call with one query: trace_id:"abc-123-...".
File layout
Marchwarden writes to ~/.marchwarden/ exclusively. Nothing else on disk is touched.
~/.marchwarden/
├── prices.toml # auto-seeded price table; edit when rates change
├── costs.jsonl # cost ledger, one line per research call
├── traces/
│ └── <trace_id>.jsonl # per-call audit log, one file per call
└── logs/
└── marchwarden.log # only if MARCHWARDEN_LOG_FILE=1
└── marchwarden.log.{1..5} # rotated backups
All files are append-only or rewritten safely; you can tail -f, jq, or back them up freely.
Running in Docker
The same workflows work inside the docker test image — useful for sandboxed runs or to avoid touching the host's Python:
make docker-build # one-time
./scripts/docker-test.sh ask "your question" --depth deep
./scripts/docker-test.sh replay <trace_id>
The ask and replay subcommands of docker-test.sh mount:
~/secrets:/root/secrets:ro— your API keys~/.marchwarden:/root/.marchwarden— traces, costs, logs persist back to the host
The script also forwards MARCHWARDEN_MODEL from the host environment if set.
Troubleshooting
marchwarden: command not found after make install
Either:
- The venv isn't activated. Run
source .venv/bin/activate, or usemake askwhich calls.venv/bin/marchwardendirectly. - A stale install exists at
~/.local/bin/marchwarden. Runwhich -a marchwarden, delete the stale copy, thenhash -r.
ModuleNotFoundError: No module named 'cli'
The marchwarden script being run is from a stale install (e.g., a previous pip install --user or pipx install) that doesn't know about the current source layout. Same fix as above.
Error: HTTP 404 Not Found on the Anthropic API
Your MARCHWARDEN_MODEL is set to a model id that doesn't exist. Check claude-sonnet-4-6 or claude-opus-4-6. The default is claude-sonnet-4-6.
Calls with unknown model price: N warning in marchwarden costs
You ran a research call with a model_id not present in ~/.marchwarden/prices.toml. Add a section for it:
[models."your-model-id"]
input_per_mtok_usd = 3.00
output_per_mtok_usd = 15.00
Then re-run marchwarden costs. Existing ledger entries with null cost won't be retroactively fixed; future calls will pick up the new prices.
Budget status: spent on every run
This is expected, not an error. See Reading the output → Confidence panel and Depth presets → How the budget is enforced for details.
Synthesis fallback ("Research completed but synthesis failed")
This used to happen when the synthesis JSON exceeded its max_tokens cap, but was fixed in PR #20. If you still see it, file an issue with the trace_id — the JSONL trace will contain the exact synthesis_error step including the model's raw response and parse error.
The marchwarden ask output is paginated / cut off
rich defaults to your terminal width. If lines are wrapping ugly, widen your terminal or pipe to less -R to see colors:
marchwarden ask "..." 2>&1 | less -R
FAQ
How long does a research call take? Typical wall-clock times: shallow ~15s, balanced ~30–60s, deep ~60–120s. Mostly LLM latency, not network.
How much does a call cost?
At current Sonnet 4.6 rates: shallow ~$0.02, balanced ~$0.05–$0.15, deep ~$0.20–$0.60. Run marchwarden costs after a few calls to see your actual numbers.
Can I use a different model?
Yes. MARCHWARDEN_MODEL=claude-opus-4-6 marchwarden ask "..." will use Opus instead of Sonnet. Make sure the model id is in your prices.toml so the cost ledger can estimate spend.
Can the agent access local files / databases? Not yet. V1 is web-search only. V2+ (per the Roadmap) will add file/document and database researchers — same contract, different tools.
Does the agent learn between calls?
No. Each research() call is stateless. The trace logs and cost ledger accumulate over time, but the agent itself starts fresh every time. Cross-call learning is on the V2+ roadmap.
Where do I report bugs?
Open an issue at the Forgejo repo. Include the trace_id from the Trace footer — it lets us reconstruct exactly what happened.
See also: Architecture, Research Contract, Development Guide, Roadmap