marchwarden

Author	SHA1	Message	Date
Jeff Smith	ae48acd421	depth flag now drives constraint defaults (#30 ) Previously the depth parameter (shallow/balanced/deep) was passed only as a text hint inside the agent's user message, with no mechanical effect on iterations, token budget, or source count. The flag was effectively cosmetic — the LLM was expected to "interpret" it. Add DEPTH_PRESETS table and constraints_for_depth() helper in researchers.web.models: shallow: 2 iters, 5,000 tokens, 5 sources balanced: 5 iters, 20,000 tokens, 10 sources (= historical defaults) deep: 8 iters, 60,000 tokens, 20 sources Wired through the stack: - WebResearcher.research(): when constraints is None, builds from the depth preset instead of bare ResearchConstraints() - MCP server `research` tool: max_iterations and token_budget now default to None; constraints are built via constraints_for_depth with explicit values overriding the preset - CLI `ask` command: --max-iterations and --budget default to None; the CLI only forwards them to the MCP tool when set, so unset flags fall through to the depth preset balanced is unchanged from the historical defaults so existing callers see no behavior difference. Explicit --max-iterations / --budget always win over the preset. Tests cover each preset's values, balanced backward-compat, unknown depth fallback, full override, and partial override. 116/116 tests passing. Live-verified: --depth shallow on a simple question now caps at 2 iterations and stays under budget. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:27:38 -06:00
Jeff Smith	c0d4f391b6	Display budget as spend status, not exhaustion alarm Replace 'Budget exhausted: True/False' with 'Budget status: spent / under cap' in the Confidence panel. The previous wording read as a failure indicator when in practice 'exhausted' just means the agent spent its tool-use cap before voluntarily stopping — the normal, expected outcome on real questions with the default 20k budget. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:12:39 -06:00
Jeff Smith	6fdf0e338a	M2.5.3: marchwarden costs CLI command (#26 ) Adds operator-facing `marchwarden costs` subcommand that reads the JSONL ledger from M2.5.2 and pretty-prints a rich summary: - Cost Summary panel: total calls, total spend, total tokens (input/ output split), Tavily search count, warning for any calls with unknown model prices - Per-Day table sorted by date - Per-Model table sorted by model id - Highest-Cost Call panel with trace_id and question Flags: --since ISO date or relative shorthand (7d, 24h, 2w, 1m) --until same --model filter to a specific model_id --json emit raw filtered ledger entries instead of the table --ledger override default path (mostly for tests) Also fixes a Dockerfile gap: the obs/ package added in M2.5.1 was not being COPYed into the image, so the installed `marchwarden` entry point couldn't import it. Tests had been passing because they mounted /app over the install. Adding `COPY obs ./obs` restores parity. Tests cover summary rendering, model filter, since-date filter, JSON output, and the empty-ledger friendly path. 110/110 passing. End-to-end verified against the real cost ledger. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:57:39 -06:00
Jeff Smith	8a62f6b014	M2.5.1: Structured application logger via structlog (#24 ) Adds an operational logging layer separate from the JSONL trace audit logs. Operational logs cover system events (startup, errors, MCP transport, research lifecycle); JSONL traces remain the researcher provenance audit trail. Backend: structlog with two renderers selectable via MARCHWARDEN_LOG_FORMAT (json\|console). Defaults to console when stderr is a TTY, json otherwise — so dev runs are human-readable and shipped runs (containers, automation) emit OpenSearch-ready JSON without configuration. Key features: - Named loggers per component: marchwarden.cli, marchwarden.mcp, marchwarden.researcher.web - MARCHWARDEN_LOG_LEVEL controls global level (default INFO) - MARCHWARDEN_LOG_FILE=1 enables a 10MB-rotating file at ~/.marchwarden/logs/marchwarden.log - structlog contextvars bind trace_id + researcher at the start of each research() call so every downstream log line carries them automatically; cleared on completion - stdlib logging is funneled through the same pipeline so noisy third-party loggers (httpx, anthropic) get the same formatting and quieted to WARN unless DEBUG is requested - Logs to stderr to keep MCP stdio stdout clean Wired into: - cli.main.cli — configures logging on startup, logs ask_started/ ask_completed/ask_failed - researchers.web.server.main — configures logging on startup, logs mcp_server_starting - researchers.web.agent.research — binds trace context, logs research_started/research_completed Tests verify JSON and console formats, contextvar propagation, level filtering, idempotency, and auto-configure-on-first-use. 94/94 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:46:51 -06:00
Jeff Smith	d0a732735e	Propagate parent env to MCP server subprocess (#18 ) The mcp SDK's StdioServerParameters does not pass the parent process's environment to the spawned server by default, so env vars set on the CLI process (notably MARCHWARDEN_MODEL) were silently dropped on the way to the researcher. Pass env=os.environ.copy() to StdioServerParameters so the server sees the same environment as the CLI. Also update scripts/docker-test.sh to forward MARCHWARDEN_MODEL into the container and to detect a non-TTY parent so non-interactive `ask` invocations don't fail with "the input device is not a TTY". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:31:14 -06:00
Jeff Smith	273d144381	M2.2: marchwarden replay CLI command (#9 ) Adds `marchwarden replay <trace_id>` to pretty-print a prior research run from its JSONL trace file. Resolves the trace under ~/.marchwarden/traces/ by default; --trace-dir overrides for tests and custom locations. Renders each step as a row with action, decision, extra fields, and content_hash. Friendly errors for unknown trace_id and malformed JSON lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 14:57:37 -06:00
Jeff Smith	87a34c60d1	M2.1: marchwarden ask CLI command (#8 ) Click app with `ask` subcommand that spawns the web researcher MCP server over stdio, calls the research tool, and pretty-prints the ResearchResult contract using rich (panels for answer/confidence/cost, tables for citations, gaps, discovery events, and open questions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 14:51:40 -06:00

7 commits