Replace 'Budget exhausted: True/False' with 'Budget status: spent /
under cap' in the Confidence panel. The previous wording read as a
failure indicator when in practice 'exhausted' just means the agent
spent its tool-use cap before voluntarily stopping — the normal,
expected outcome on real questions with the default 20k budget.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds operator-facing `marchwarden costs` subcommand that reads the
JSONL ledger from M2.5.2 and pretty-prints a rich summary:
- Cost Summary panel: total calls, total spend, total tokens (input/
output split), Tavily search count, warning for any calls with
unknown model prices
- Per-Day table sorted by date
- Per-Model table sorted by model id
- Highest-Cost Call panel with trace_id and question
Flags:
--since ISO date or relative shorthand (7d, 24h, 2w, 1m)
--until same
--model filter to a specific model_id
--json emit raw filtered ledger entries instead of the table
--ledger override default path (mostly for tests)
Also fixes a Dockerfile gap: the obs/ package added in M2.5.1 was
not being COPYed into the image, so the installed `marchwarden`
entry point couldn't import it. Tests had been passing because
they mounted /app over the install. Adding `COPY obs ./obs`
restores parity.
Tests cover summary rendering, model filter, since-date filter,
JSON output, and the empty-ledger friendly path. 110/110 passing.
End-to-end verified against the real cost ledger.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds an operational logging layer separate from the JSONL trace
audit logs. Operational logs cover system events (startup, errors,
MCP transport, research lifecycle); JSONL traces remain the
researcher provenance audit trail.
Backend: structlog with two renderers selectable via
MARCHWARDEN_LOG_FORMAT (json|console). Defaults to console when
stderr is a TTY, json otherwise — so dev runs are human-readable
and shipped runs (containers, automation) emit OpenSearch-ready
JSON without configuration.
Key features:
- Named loggers per component: marchwarden.cli,
marchwarden.mcp, marchwarden.researcher.web
- MARCHWARDEN_LOG_LEVEL controls global level (default INFO)
- MARCHWARDEN_LOG_FILE=1 enables a 10MB-rotating file at
~/.marchwarden/logs/marchwarden.log
- structlog contextvars bind trace_id + researcher at the start
of each research() call so every downstream log line carries
them automatically; cleared on completion
- stdlib logging is funneled through the same pipeline so noisy
third-party loggers (httpx, anthropic) get the same formatting
and quieted to WARN unless DEBUG is requested
- Logs to stderr to keep MCP stdio stdout clean
Wired into:
- cli.main.cli — configures logging on startup, logs ask_started/
ask_completed/ask_failed
- researchers.web.server.main — configures logging on startup,
logs mcp_server_starting
- researchers.web.agent.research — binds trace context, logs
research_started/research_completed
Tests verify JSON and console formats, contextvar propagation,
level filtering, idempotency, and auto-configure-on-first-use.
94/94 tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The mcp SDK's StdioServerParameters does not pass the parent
process's environment to the spawned server by default, so env
vars set on the CLI process (notably MARCHWARDEN_MODEL) were
silently dropped on the way to the researcher.
Pass env=os.environ.copy() to StdioServerParameters so the server
sees the same environment as the CLI. Also update scripts/docker-test.sh
to forward MARCHWARDEN_MODEL into the container and to detect a
non-TTY parent so non-interactive `ask` invocations don't fail with
"the input device is not a TTY".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `marchwarden replay <trace_id>` to pretty-print a prior research
run from its JSONL trace file. Resolves the trace under
~/.marchwarden/traces/ by default; --trace-dir overrides for tests and
custom locations. Renders each step as a row with action, decision,
extra fields, and content_hash. Friendly errors for unknown trace_id
and malformed JSON lines.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Click app with `ask` subcommand that spawns the web researcher MCP
server over stdio, calls the research tool, and pretty-prints the
ResearchResult contract using rich (panels for answer/confidence/cost,
tables for citations, gaps, discovery events, and open questions).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Directory layout: researchers/web/, orchestrator/, cli/, docs/wiki/
- README with quick start and vision
- CONTRIBUTING with workflow and testing guidelines
- pyproject.toml with dependencies and build config
- .gitignore for Python projects
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>