This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

User Guide

This guide is for operators using Marchwarden to ask research questions, replay traces, and track costs. If you're contributing code, see the Development Guide instead.

What Marchwarden is
Installation
Configuration
Asking a question — marchwarden ask
Reading the output
Replaying a trace — marchwarden replay
Tracking spend — marchwarden costs
Depth presets
Operational logging
File layout under ~/.marchwarden/
Running in Docker
Troubleshooting
FAQ

What Marchwarden is

Marchwarden is an agentic web research assistant. You give it a question; it plans search queries, fetches the most promising sources, synthesizes a grounded answer with inline citations, and reports the gaps it could not resolve. Each call returns a structured ResearchResult containing:

answer — multi-paragraph synthesis with inline source references
citations — list of sources with raw verbatim excerpts (no rewriting)
gaps — what the agent could not resolve, categorized
discovery_events — lateral findings worth investigating with other tools
open_questions — follow-up questions the agent generated
confidence + factors — auditable score, not just a number
cost_metadata — tokens, iterations, wall-clock time, model id
trace_id — UUID linking to a per-call audit log

Every research call is recorded three ways:

a JSONL trace (per-step audit log) at ~/.marchwarden/traces/<trace_id>.jsonl
a one-line cost ledger entry at ~/.marchwarden/costs.jsonl
structured operational logs to stderr (and optionally a rotating file)

Installation

Option 1 — Make + venv (recommended for local use)

git clone https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden.git
cd marchwarden
make install
source .venv/bin/activate

make install creates .venv/, installs the project editable with dev extras, and wires the marchwarden command. After activation, which marchwarden should resolve to .venv/bin/marchwarden.

If marchwarden reports ModuleNotFoundError: No module named 'cli', you have a stale install on your $PATH:

which -a marchwarden     # find the stale copy
rm <path/to/stale>       # remove it
hash -r                  # clear bash's command cache

Option 2 — Manual venv

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Option 3 — Docker

make docker-build
./scripts/docker-test.sh ask "your question here"

The docker flow mounts ~/secrets (read-only) and ~/.marchwarden/ (read-write) into the container, so traces, costs, and logs land in your real home directory the same as a venv install.

Configuration

API keys (required)

Marchwarden reads two keys from ~/secrets (a shell-style KEY=value file):

ANTHROPIC_API_KEY=sk-ant-...
TAVILY_API_KEY=tvly-...

Get them from:

Anthropic: https://console.anthropic.com
Tavily: https://tavily.com (free tier: 1,000 searches/month)

Environment variables (optional)

Variable	Purpose	Default
`MARCHWARDEN_MODEL`	Anthropic model id used by the researcher	`claude-sonnet-4-6`
`MARCHWARDEN_LOG_LEVEL`	`DEBUG` / `INFO` / `WARNING` / `ERROR`	`INFO`
`MARCHWARDEN_LOG_FORMAT`	`json` (OpenSearch-ready) or `console` (colored)	auto: `console` if stderr is a TTY, `json` otherwise
`MARCHWARDEN_LOG_FILE`	Set to `1` to also log to `~/.marchwarden/logs/marchwarden.log` (10MB rotation, 5 backups)	unset
`MARCHWARDEN_COST_LEDGER`	Override cost ledger path	`~/.marchwarden/costs.jsonl`

Price table

~/.marchwarden/prices.toml is auto-created on first run with current Anthropic + Tavily rates. Edit it manually when upstream prices change — Marchwarden does not auto-fetch. Unknown models record estimated_cost_usd: null rather than crash.

Asking a question

marchwarden ask "What are ideal crops for a garden in Utah?"

Flags

Flag	Purpose
`--depth shallow\|balanced\|deep`	Pick a research depth preset (default: `balanced`)
`--budget INT`	Override the depth's token budget
`--max-iterations INT`	Override the depth's iteration cap

--budget and --max-iterations always win over the depth preset. If both are unset, the depth preset chooses.

Examples

# Quick lookup — shallow depth (2 iterations, 5k tokens, 5 sources)
marchwarden ask "What is the capital of Utah?" --depth shallow

# Default — balanced depth (5 iterations, 20k tokens, 10 sources)
marchwarden ask "Compare cool-season and warm-season crops for Utah"

# Thorough — deep depth (8 iterations, 60k tokens, 20 sources)
marchwarden ask "Compare AWS Lambda vs Azure Functions for HFT" --depth deep

# Override the depth preset for one call
marchwarden ask "..." --depth balanced --budget 50000

Reading the output

Output is rendered with rich. Each section is a panel or table:

Answer panel

The synthesized answer in prose. Source numbers like [Source 4] map to entries in the Citations table.

Citations table

Column	Meaning
`#`	Source index (matches `[Source N]` in the answer)
`Title / Locator`	Page title plus the URL
`Excerpt`	Verbatim text from the source (up to 500 chars). This bypasses the synthesizer to prevent quiet rewriting
`Conf`	Researcher's confidence in this source's accuracy (0.00–1.00)

If the answer contains a claim, you can read the matching Excerpt to verify the source actually says what the synthesizer claims it says.

Gaps table

Categorized reasons the agent couldn't fully resolve the question:

source_not_found — no relevant pages indexed
access_denied — sources existed but couldn't be fetched
budget_exhausted — ran out of iterations / tokens
contradictory_sources — sources disagreed and the disagreement wasn't resolvable
scope_exceeded — the question reaches into a domain web search can't answer (academic papers, internal databases, legal docs)

Discovery Events table

Lateral findings: things the agent stumbled across that aren't in the answer but might matter for follow-up. Each suggests a target researcher and a query — these are how a future PI orchestrator (V2) will dispatch other specialists.

Open Questions table

Forward-looking questions the agent generated mid-research. Each has a priority (high/medium/low) and the source context that prompted it. These often reveal the next useful question to ask.

Confidence panel

Field	Meaning
`Overall`	0.00–1.00. Read this in the context of the factors below, not in isolation
`Corroborating sources`	How many sources agree on the core claims
`Source authority`	`high` (.gov/.edu/peer-reviewed), `medium` (established orgs), `low` (blogs/forums)
`Contradiction detected`	Did sources disagree?
`Query specificity match`	How well the results addressed the actual question (0.00–1.00)
`Budget status`	`spent` (the loop hit its cap before voluntarily stopping) or `under cap`
`Recency`	`current` (<1y) / `recent` (1–3y) / `dated` (>3y) / `unknown`

Budget status: spent is normal, not an error. It means the agent used the cap you gave it before deciding it was done. Pair this with Overall: 0.88+ for a confident answer that fully spent its budget.

Cost panel

Tokens, Iterations, Wall time, Model. The token total includes the synthesis call, which is uncapped by design (see Depth presets below).

Trace footer

The trace_id is a UUID. Save it if you'll want to replay this run later.

Replaying a trace

Every research call writes a JSONL audit log at ~/.marchwarden/traces/<trace_id>.jsonl. Replay it with:

marchwarden replay <trace_id>

The replay table shows every step the agent took: planning calls, search queries, URL fetches with content hashes, synthesis attempts, and the final outcome. Use it to:

Diagnose unexpected results — see exactly what queries the agent ran and what it found
Audit citations — every fetch records a SHA-256 content hash so you can verify the same page hasn't changed since
Debug synthesis failures — synthesis_error steps record the LLM's full raw response and parse error

Flags

Flag	Purpose
`--trace-dir PATH`	Override default trace directory (`~/.marchwarden/traces`)

Tracking spend

Every research call appends one line to ~/.marchwarden/costs.jsonl with model, tokens (input/output split), Tavily search count, and an estimated cost in USD. Inspect it with:

marchwarden costs

Output sections

Cost Summary — total calls, total spend, total tokens (with input/output split), Tavily searches, and a warning if any calls used a model not in your price table
Per Day — calls / tokens / spend grouped by day
Per Model — calls / tokens / spend grouped by model_id
Highest-Cost Call — the most expensive single run, with trace_id for follow-up

Flags

Flag	Purpose
`--since DATE`	ISO date (`2026-04-01`) or relative (`7d`, `24h`, `2w`, `1m`)
`--until DATE`	Same
`--model MODEL_ID`	Filter to a single model
`--json`	Emit raw filtered ledger entries (one JSON per line) instead of the table
`--ledger PATH`	Override default ledger location

Examples

marchwarden costs                        # all-time summary
marchwarden costs --since 7d             # last 7 days
marchwarden costs --model claude-opus-4-6
marchwarden costs --since 2026-04-01 --until 2026-04-08 --json

The --json mode is suitable for piping into jq or shipping to a billing/analytics tool.

Depth presets

The --depth flag picks sensible defaults for the agent loop. Explicit --budget and --max-iterations always override.

Depth	max_iterations	token_budget	max_sources	Use for
`shallow`	2	5,000	5	quick lookups, factual Q&A
`balanced`	5	20,000	10	default, most questions
`deep`	8	60,000	20	comparison studies, complex investigations

How the budget is enforced

The token budget is a soft cap on the tool-use loop only:

Before each new iteration, the agent checks tokens_used >= token_budget. If yes, the loop stops and synthesis runs on whatever evidence is gathered.
The synthesis call itself is uncapped — it always completes, so you get a real ResearchResult instead of a parse-failure stub.
This means total tokens reported in the Cost panel and ledger will normally exceed token_budget by the synthesis cost (~10–25k tokens depending on evidence size).

Practical implications:

A balanced run with token_budget=20000 typically reports tokens_used: 30000–50000 total. That's normal.
If you need strict total spend control, use shallow and hand-tune --budget low.
If you need thorough answers, use deep and accept that the call may consume 100k+ tokens.

Operational logging

Marchwarden logs every research step via structlog. Logs go to stderr so they don't interfere with the research output on stdout.

Log levels

INFO (default) — milestones only (~9 lines per call): research start, each iteration boundary, synthesis start/complete, completion, cost recording
DEBUG — every step (~13+ lines per call): adds individual web_search, fetch_url, and tool-result events

Formats

console — colored, human-readable; auto-selected when stderr is a TTY
json — newline-delimited JSON, OpenSearch-ready; auto-selected when stderr is not a TTY (e.g., in CI, containers, or piped output)

Set explicitly with MARCHWARDEN_LOG_FORMAT=json or =console.

Persistent file logging

MARCHWARDEN_LOG_FILE=1 marchwarden ask "..."

Logs are appended to ~/.marchwarden/logs/marchwarden.log (10MB per file, 5 rotated backups). The format respects MARCHWARDEN_LOG_FORMAT.

Context binding

Every log line emitted during a research call automatically carries:

trace_id — the same UUID you see in the Trace footer
researcher — currently always web (the researcher type)

This means in OpenSearch (or any structured log viewer) you can filter to a single research call with one query: trace_id:"abc-123-...".

File layout

Marchwarden writes to ~/.marchwarden/ exclusively. Nothing else on disk is touched.

~/.marchwarden/
├── prices.toml                      # auto-seeded price table; edit when rates change
├── costs.jsonl                      # cost ledger, one line per research call
├── traces/
│   └── <trace_id>.jsonl             # per-call audit log, one file per call
└── logs/
    └── marchwarden.log              # only if MARCHWARDEN_LOG_FILE=1
    └── marchwarden.log.{1..5}       # rotated backups

All files are append-only or rewritten safely; you can tail -f, jq, or back them up freely.

Running in Docker

The same workflows work inside the docker test image — useful for sandboxed runs or to avoid touching the host's Python:

make docker-build                                         # one-time
./scripts/docker-test.sh ask "your question" --depth deep
./scripts/docker-test.sh replay <trace_id>

The ask and replay subcommands of docker-test.sh mount:

~/secrets:/root/secrets:ro — your API keys
~/.marchwarden:/root/.marchwarden — traces, costs, logs persist back to the host

The script also forwards MARCHWARDEN_MODEL from the host environment if set.

Troubleshooting

`marchwarden: command not found` after `make install`

Either:

The venv isn't activated. Run source .venv/bin/activate, or use make ask which calls .venv/bin/marchwarden directly.
A stale install exists at ~/.local/bin/marchwarden. Run which -a marchwarden, delete the stale copy, then hash -r.

`ModuleNotFoundError: No module named 'cli'`

The marchwarden script being run is from a stale install (e.g., a previous pip install --user or pipx install) that doesn't know about the current source layout. Same fix as above.

`Error: HTTP 404 Not Found` on the Anthropic API

Your MARCHWARDEN_MODEL is set to a model id that doesn't exist. Check claude-sonnet-4-6 or claude-opus-4-6. The default is claude-sonnet-4-6.

`Calls with unknown model price: N` warning in `marchwarden costs`

You ran a research call with a model_id not present in ~/.marchwarden/prices.toml. Add a section for it:

[models."your-model-id"]
input_per_mtok_usd = 3.00
output_per_mtok_usd = 15.00

Then re-run marchwarden costs. Existing ledger entries with null cost won't be retroactively fixed; future calls will pick up the new prices.

`Budget status: spent` on every run

This is expected, not an error. See Reading the output → Confidence panel and Depth presets → How the budget is enforced for details.

Synthesis fallback ("Research completed but synthesis failed")

This used to happen when the synthesis JSON exceeded its max_tokens cap, but was fixed in PR #20. If you still see it, file an issue with the trace_id — the JSONL trace will contain the exact synthesis_error step including the model's raw response and parse error.

The `marchwarden ask` output is paginated / cut off

rich defaults to your terminal width. If lines are wrapping ugly, widen your terminal or pipe to less -R to see colors:

marchwarden ask "..." 2>&1 | less -R

FAQ

How long does a research call take? Typical wall-clock times: shallow ~15s, balanced ~30–60s, deep ~60–120s. Mostly LLM latency, not network.

How much does a call cost? At current Sonnet 4.6 rates: shallow ~$0.02, balanced ~$0.05–$0.15, deep ~$0.20–$0.60. Run marchwarden costs after a few calls to see your actual numbers.

Can I use a different model? Yes. MARCHWARDEN_MODEL=claude-opus-4-6 marchwarden ask "..." will use Opus instead of Sonnet. Make sure the model id is in your prices.toml so the cost ledger can estimate spend.

Can the agent access local files / databases? Not yet. V1 is web-search only. V2+ (per the Roadmap) will add file/document and database researchers — same contract, different tools.

Does the agent learn between calls? No. Each research() call is stateless. The trace logs and cost ledger accumulate over time, but the agent itself starts fresh every time. Cross-call learning is on the V2+ roadmap.

Where do I report bugs? Open an issue at the Forgejo repo. Include the trace_id from the Trace footer — it lets us reconstruct exactly what happened.

User Guide

Table of contents

What Marchwarden is

Installation

Option 1 — Make + venv (recommended for local use)

Option 2 — Manual venv

Option 3 — Docker

Configuration

API keys (required)

Environment variables (optional)

Price table

Asking a question

Flags

Examples

Reading the output

Answer panel

Citations table

Gaps table

Discovery Events table

Open Questions table

Confidence panel

Cost panel

Trace footer

Replaying a trace

Flags

Tracking spend

Output sections

Flags

Examples

Depth presets

How the budget is enforced

Operational logging

Log levels

Formats

Persistent file logging

Context binding

File layout

Running in Docker

Troubleshooting

marchwarden: command not found after make install

ModuleNotFoundError: No module named 'cli'

Error: HTTP 404 Not Found on the Anthropic API

Calls with unknown model price: N warning in marchwarden costs

Budget status: spent on every run

Synthesis fallback ("Research completed but synthesis failed")

The marchwarden ask output is paginated / cut off

FAQ

`marchwarden: command not found` after `make install`

`ModuleNotFoundError: No module named 'cli'`

`Error: HTTP 404 Not Found` on the Anthropic API

`Calls with unknown model price: N` warning in `marchwarden costs`

`Budget status: spent` on every run

The `marchwarden ask` output is paginated / cut off