1 UserGuide
Jeff Smith edited this page 2026-04-08 16:42:11 -06:00
This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

User Guide

This guide is for operators using Marchwarden to ask research questions, replay traces, and track costs. If you're contributing code, see the Development Guide instead.


Table of contents

  1. What Marchwarden is
  2. Installation
  3. Configuration
  4. Asking a question — marchwarden ask
  5. Reading the output
  6. Replaying a trace — marchwarden replay
  7. Tracking spend — marchwarden costs
  8. Depth presets
  9. Operational logging
  10. File layout under ~/.marchwarden/
  11. Running in Docker
  12. Troubleshooting
  13. FAQ

What Marchwarden is

Marchwarden is an agentic web research assistant. You give it a question; it plans search queries, fetches the most promising sources, synthesizes a grounded answer with inline citations, and reports the gaps it could not resolve. Each call returns a structured ResearchResult containing:

  • answer — multi-paragraph synthesis with inline source references
  • citations — list of sources with raw verbatim excerpts (no rewriting)
  • gaps — what the agent could not resolve, categorized
  • discovery_events — lateral findings worth investigating with other tools
  • open_questions — follow-up questions the agent generated
  • confidence + factors — auditable score, not just a number
  • cost_metadata — tokens, iterations, wall-clock time, model id
  • trace_id — UUID linking to a per-call audit log

Every research call is recorded three ways:

  • a JSONL trace (per-step audit log) at ~/.marchwarden/traces/<trace_id>.jsonl
  • a one-line cost ledger entry at ~/.marchwarden/costs.jsonl
  • structured operational logs to stderr (and optionally a rotating file)

Installation

git clone https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden.git
cd marchwarden
make install
source .venv/bin/activate

make install creates .venv/, installs the project editable with dev extras, and wires the marchwarden command. After activation, which marchwarden should resolve to .venv/bin/marchwarden.

If marchwarden reports ModuleNotFoundError: No module named 'cli', you have a stale install on your $PATH:

which -a marchwarden     # find the stale copy
rm <path/to/stale>       # remove it
hash -r                  # clear bash's command cache

Option 2 — Manual venv

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Option 3 — Docker

make docker-build
./scripts/docker-test.sh ask "your question here"

The docker flow mounts ~/secrets (read-only) and ~/.marchwarden/ (read-write) into the container, so traces, costs, and logs land in your real home directory the same as a venv install.


Configuration

API keys (required)

Marchwarden reads two keys from ~/secrets (a shell-style KEY=value file):

ANTHROPIC_API_KEY=sk-ant-...
TAVILY_API_KEY=tvly-...

Get them from:

Environment variables (optional)

Variable Purpose Default
MARCHWARDEN_MODEL Anthropic model id used by the researcher claude-sonnet-4-6
MARCHWARDEN_LOG_LEVEL DEBUG / INFO / WARNING / ERROR INFO
MARCHWARDEN_LOG_FORMAT json (OpenSearch-ready) or console (colored) auto: console if stderr is a TTY, json otherwise
MARCHWARDEN_LOG_FILE Set to 1 to also log to ~/.marchwarden/logs/marchwarden.log (10MB rotation, 5 backups) unset
MARCHWARDEN_COST_LEDGER Override cost ledger path ~/.marchwarden/costs.jsonl

Price table

~/.marchwarden/prices.toml is auto-created on first run with current Anthropic + Tavily rates. Edit it manually when upstream prices change — Marchwarden does not auto-fetch. Unknown models record estimated_cost_usd: null rather than crash.


Asking a question

marchwarden ask "What are ideal crops for a garden in Utah?"

Flags

Flag Purpose
--depth shallow|balanced|deep Pick a research depth preset (default: balanced)
--budget INT Override the depth's token budget
--max-iterations INT Override the depth's iteration cap

--budget and --max-iterations always win over the depth preset. If both are unset, the depth preset chooses.

Examples

# Quick lookup — shallow depth (2 iterations, 5k tokens, 5 sources)
marchwarden ask "What is the capital of Utah?" --depth shallow

# Default — balanced depth (5 iterations, 20k tokens, 10 sources)
marchwarden ask "Compare cool-season and warm-season crops for Utah"

# Thorough — deep depth (8 iterations, 60k tokens, 20 sources)
marchwarden ask "Compare AWS Lambda vs Azure Functions for HFT" --depth deep

# Override the depth preset for one call
marchwarden ask "..." --depth balanced --budget 50000

Reading the output

Output is rendered with rich. Each section is a panel or table:

Answer panel

The synthesized answer in prose. Source numbers like [Source 4] map to entries in the Citations table.

Citations table

Column Meaning
# Source index (matches [Source N] in the answer)
Title / Locator Page title plus the URL
Excerpt Verbatim text from the source (up to 500 chars). This bypasses the synthesizer to prevent quiet rewriting
Conf Researcher's confidence in this source's accuracy (0.001.00)

If the answer contains a claim, you can read the matching Excerpt to verify the source actually says what the synthesizer claims it says.

Gaps table

Categorized reasons the agent couldn't fully resolve the question:

  • source_not_found — no relevant pages indexed
  • access_denied — sources existed but couldn't be fetched
  • budget_exhausted — ran out of iterations / tokens
  • contradictory_sources — sources disagreed and the disagreement wasn't resolvable
  • scope_exceeded — the question reaches into a domain web search can't answer (academic papers, internal databases, legal docs)

Discovery Events table

Lateral findings: things the agent stumbled across that aren't in the answer but might matter for follow-up. Each suggests a target researcher and a query — these are how a future PI orchestrator (V2) will dispatch other specialists.

Open Questions table

Forward-looking questions the agent generated mid-research. Each has a priority (high/medium/low) and the source context that prompted it. These often reveal the next useful question to ask.

Confidence panel

Field Meaning
Overall 0.001.00. Read this in the context of the factors below, not in isolation
Corroborating sources How many sources agree on the core claims
Source authority high (.gov/.edu/peer-reviewed), medium (established orgs), low (blogs/forums)
Contradiction detected Did sources disagree?
Query specificity match How well the results addressed the actual question (0.001.00)
Budget status spent (the loop hit its cap before voluntarily stopping) or under cap
Recency current (<1y) / recent (13y) / dated (>3y) / unknown

Budget status: spent is normal, not an error. It means the agent used the cap you gave it before deciding it was done. Pair this with Overall: 0.88+ for a confident answer that fully spent its budget.

Cost panel

Tokens, Iterations, Wall time, Model. The token total includes the synthesis call, which is uncapped by design (see Depth presets below).

The trace_id is a UUID. Save it if you'll want to replay this run later.


Replaying a trace

Every research call writes a JSONL audit log at ~/.marchwarden/traces/<trace_id>.jsonl. Replay it with:

marchwarden replay <trace_id>

The replay table shows every step the agent took: planning calls, search queries, URL fetches with content hashes, synthesis attempts, and the final outcome. Use it to:

  • Diagnose unexpected results — see exactly what queries the agent ran and what it found
  • Audit citations — every fetch records a SHA-256 content hash so you can verify the same page hasn't changed since
  • Debug synthesis failuressynthesis_error steps record the LLM's full raw response and parse error

Flags

Flag Purpose
--trace-dir PATH Override default trace directory (~/.marchwarden/traces)

Tracking spend

Every research call appends one line to ~/.marchwarden/costs.jsonl with model, tokens (input/output split), Tavily search count, and an estimated cost in USD. Inspect it with:

marchwarden costs

Output sections

  • Cost Summary — total calls, total spend, total tokens (with input/output split), Tavily searches, and a warning if any calls used a model not in your price table
  • Per Day — calls / tokens / spend grouped by day
  • Per Model — calls / tokens / spend grouped by model_id
  • Highest-Cost Call — the most expensive single run, with trace_id for follow-up

Flags

Flag Purpose
--since DATE ISO date (2026-04-01) or relative (7d, 24h, 2w, 1m)
--until DATE Same
--model MODEL_ID Filter to a single model
--json Emit raw filtered ledger entries (one JSON per line) instead of the table
--ledger PATH Override default ledger location

Examples

marchwarden costs                        # all-time summary
marchwarden costs --since 7d             # last 7 days
marchwarden costs --model claude-opus-4-6
marchwarden costs --since 2026-04-01 --until 2026-04-08 --json

The --json mode is suitable for piping into jq or shipping to a billing/analytics tool.


Depth presets

The --depth flag picks sensible defaults for the agent loop. Explicit --budget and --max-iterations always override.

Depth max_iterations token_budget max_sources Use for
shallow 2 5,000 5 quick lookups, factual Q&A
balanced 5 20,000 10 default, most questions
deep 8 60,000 20 comparison studies, complex investigations

How the budget is enforced

The token budget is a soft cap on the tool-use loop only:

  • Before each new iteration, the agent checks tokens_used >= token_budget. If yes, the loop stops and synthesis runs on whatever evidence is gathered.
  • The synthesis call itself is uncapped — it always completes, so you get a real ResearchResult instead of a parse-failure stub.
  • This means total tokens reported in the Cost panel and ledger will normally exceed token_budget by the synthesis cost (~1025k tokens depending on evidence size).

Practical implications:

  • A balanced run with token_budget=20000 typically reports tokens_used: 3000050000 total. That's normal.
  • If you need strict total spend control, use shallow and hand-tune --budget low.
  • If you need thorough answers, use deep and accept that the call may consume 100k+ tokens.

Operational logging

Marchwarden logs every research step via structlog. Logs go to stderr so they don't interfere with the research output on stdout.

Log levels

  • INFO (default) — milestones only (~9 lines per call): research start, each iteration boundary, synthesis start/complete, completion, cost recording
  • DEBUG — every step (~13+ lines per call): adds individual web_search, fetch_url, and tool-result events

Formats

  • console — colored, human-readable; auto-selected when stderr is a TTY
  • json — newline-delimited JSON, OpenSearch-ready; auto-selected when stderr is not a TTY (e.g., in CI, containers, or piped output)

Set explicitly with MARCHWARDEN_LOG_FORMAT=json or =console.

Persistent file logging

MARCHWARDEN_LOG_FILE=1 marchwarden ask "..."

Logs are appended to ~/.marchwarden/logs/marchwarden.log (10MB per file, 5 rotated backups). The format respects MARCHWARDEN_LOG_FORMAT.

Context binding

Every log line emitted during a research call automatically carries:

  • trace_id — the same UUID you see in the Trace footer
  • researcher — currently always web (the researcher type)

This means in OpenSearch (or any structured log viewer) you can filter to a single research call with one query: trace_id:"abc-123-...".


File layout

Marchwarden writes to ~/.marchwarden/ exclusively. Nothing else on disk is touched.

~/.marchwarden/
├── prices.toml                      # auto-seeded price table; edit when rates change
├── costs.jsonl                      # cost ledger, one line per research call
├── traces/
│   └── <trace_id>.jsonl             # per-call audit log, one file per call
└── logs/
    └── marchwarden.log              # only if MARCHWARDEN_LOG_FILE=1
    └── marchwarden.log.{1..5}       # rotated backups

All files are append-only or rewritten safely; you can tail -f, jq, or back them up freely.


Running in Docker

The same workflows work inside the docker test image — useful for sandboxed runs or to avoid touching the host's Python:

make docker-build                                         # one-time
./scripts/docker-test.sh ask "your question" --depth deep
./scripts/docker-test.sh replay <trace_id>

The ask and replay subcommands of docker-test.sh mount:

  • ~/secrets:/root/secrets:ro — your API keys
  • ~/.marchwarden:/root/.marchwarden — traces, costs, logs persist back to the host

The script also forwards MARCHWARDEN_MODEL from the host environment if set.


Troubleshooting

marchwarden: command not found after make install

Either:

  1. The venv isn't activated. Run source .venv/bin/activate, or use make ask which calls .venv/bin/marchwarden directly.
  2. A stale install exists at ~/.local/bin/marchwarden. Run which -a marchwarden, delete the stale copy, then hash -r.

ModuleNotFoundError: No module named 'cli'

The marchwarden script being run is from a stale install (e.g., a previous pip install --user or pipx install) that doesn't know about the current source layout. Same fix as above.

Error: HTTP 404 Not Found on the Anthropic API

Your MARCHWARDEN_MODEL is set to a model id that doesn't exist. Check claude-sonnet-4-6 or claude-opus-4-6. The default is claude-sonnet-4-6.

Calls with unknown model price: N warning in marchwarden costs

You ran a research call with a model_id not present in ~/.marchwarden/prices.toml. Add a section for it:

[models."your-model-id"]
input_per_mtok_usd = 3.00
output_per_mtok_usd = 15.00

Then re-run marchwarden costs. Existing ledger entries with null cost won't be retroactively fixed; future calls will pick up the new prices.

Budget status: spent on every run

This is expected, not an error. See Reading the output → Confidence panel and Depth presets → How the budget is enforced for details.

Synthesis fallback ("Research completed but synthesis failed")

This used to happen when the synthesis JSON exceeded its max_tokens cap, but was fixed in PR #20. If you still see it, file an issue with the trace_id — the JSONL trace will contain the exact synthesis_error step including the model's raw response and parse error.

The marchwarden ask output is paginated / cut off

rich defaults to your terminal width. If lines are wrapping ugly, widen your terminal or pipe to less -R to see colors:

marchwarden ask "..." 2>&1 | less -R

FAQ

How long does a research call take? Typical wall-clock times: shallow ~15s, balanced ~3060s, deep ~60120s. Mostly LLM latency, not network.

How much does a call cost? At current Sonnet 4.6 rates: shallow ~$0.02, balanced ~$0.05$0.15, deep ~$0.20$0.60. Run marchwarden costs after a few calls to see your actual numbers.

Can I use a different model? Yes. MARCHWARDEN_MODEL=claude-opus-4-6 marchwarden ask "..." will use Opus instead of Sonnet. Make sure the model id is in your prices.toml so the cost ledger can estimate spend.

Can the agent access local files / databases? Not yet. V1 is web-search only. V2+ (per the Roadmap) will add file/document and database researchers — same contract, different tools.

Does the agent learn between calls? No. Each research() call is stateless. The trace logs and cost ledger accumulate over time, but the agent itself starts fresh every time. Cross-call learning is on the V2+ roadmap.

Where do I report bugs? Open an issue at the Forgejo repo. Include the trace_id from the Trace footer — it lets us reconstruct exactly what happened.


See also: Architecture, Research Contract, Development Guide, Roadmap