marchwarden

Author	SHA1	Message	Date
Jeff Smith	1203b07248	fix(observability): persist full ResearchResult and per-item trace events Closes #54. The JSONL trace previously stored only counts on the `complete` event (gap_count, citation_count, discovery_count). Replay could re-render the step log but could not recover which gaps fired or which sources were cited, blocking M3.2/M3.3 stress-testing and calibration work. Two complementary fixes: 1. (a) TraceLogger.write_result() dumps the pydantic ResearchResult to `<trace_id>.result.json` next to the JSONL trace. The agent calls it right before emitting the `complete` step. `cli replay` now loads the sibling result file when present and renders the structured tables under the trace step log. 2. (b) The agent emits one `gap_recorded`, `citation_recorded`, or `discovery_recorded` trace event per item from the final result. This gives the JSONL stream a queryable timeline of what was kept, with categories and topics in-band, without needing to load the result sibling. Tests: 4 added (127 total passing). Smoke-tested live with a real ask; both files written and replay rendering verified. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 19:27:33 -06:00
Jeff Smith	6fdf0e338a	M2.5.3: marchwarden costs CLI command (#26 ) Adds operator-facing `marchwarden costs` subcommand that reads the JSONL ledger from M2.5.2 and pretty-prints a rich summary: - Cost Summary panel: total calls, total spend, total tokens (input/ output split), Tavily search count, warning for any calls with unknown model prices - Per-Day table sorted by date - Per-Model table sorted by model id - Highest-Cost Call panel with trace_id and question Flags: --since ISO date or relative shorthand (7d, 24h, 2w, 1m) --until same --model filter to a specific model_id --json emit raw filtered ledger entries instead of the table --ledger override default path (mostly for tests) Also fixes a Dockerfile gap: the obs/ package added in M2.5.1 was not being COPYed into the image, so the installed `marchwarden` entry point couldn't import it. Tests had been passing because they mounted /app over the install. Adding `COPY obs ./obs` restores parity. Tests cover summary rendering, model filter, since-date filter, JSON output, and the empty-ledger friendly path. 110/110 passing. End-to-end verified against the real cost ledger. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:57:39 -06:00
Jeff Smith	273d144381	M2.2: marchwarden replay CLI command (#9 ) Adds `marchwarden replay <trace_id>` to pretty-print a prior research run from its JSONL trace file. Resolves the trace under ~/.marchwarden/traces/ by default; --trace-dir overrides for tests and custom locations. Renders each step as a row with action, decision, extra fields, and content_hash. Friendly errors for unknown trace_id and malformed JSON lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 14:57:37 -06:00
Jeff Smith	87a34c60d1	M2.1: marchwarden ask CLI command (#8 ) Click app with `ask` subcommand that spawns the web researcher MCP server over stdio, calls the research tool, and pretty-prints the ResearchResult contract using rich (panels for answer/confidence/cost, tables for citations, gaps, discovery events, and open questions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 14:51:40 -06:00

4 commits