marchwarden/tests
Jeff Smith 1203b07248 fix(observability): persist full ResearchResult and per-item trace events
Closes #54.

The JSONL trace previously stored only counts on the `complete` event
(gap_count, citation_count, discovery_count). Replay could re-render the
step log but could not recover which gaps fired or which sources were
cited, blocking M3.2/M3.3 stress-testing and calibration work.

Two complementary fixes:

1. (a) TraceLogger.write_result() dumps the pydantic ResearchResult to
   `<trace_id>.result.json` next to the JSONL trace. The agent calls it
   right before emitting the `complete` step. `cli replay` now loads the
   sibling result file when present and renders the structured tables
   under the trace step log.

2. (b) The agent emits one `gap_recorded`, `citation_recorded`, or
   `discovery_recorded` trace event per item from the final result. This
   gives the JSONL stream a queryable timeline of what was kept, with
   categories and topics in-band, without needing to load the result
   sibling.

Tests: 4 added (127 total passing). Smoke-tested live with a real ask;
both files written and replay rendering verified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:27:33 -06:00
..
__init__.py M0.3: Implement contract v1 Pydantic models with tests 2026-04-08 14:00:45 -06:00
test_agent.py fix(observability): persist full ResearchResult and per-item trace events 2026-04-08 19:27:33 -06:00
test_cli.py fix(observability): persist full ResearchResult and per-item trace events 2026-04-08 19:27:33 -06:00
test_costs.py M2.5.2: Cost ledger with price table (#25) 2026-04-08 15:52:25 -06:00
test_models.py depth flag now drives constraint defaults (#30) 2026-04-08 16:27:38 -06:00
test_obs.py M2.5.1: Structured application logger via structlog (#24) 2026-04-08 15:46:51 -06:00
test_server.py M1.4: MCP server wrapping web researcher 2026-04-08 14:41:13 -06:00
test_tools.py M1.1: Search and fetch tools with tests 2026-04-08 14:17:18 -06:00
test_trace.py fix(observability): persist full ResearchResult and per-item trace events 2026-04-08 19:27:33 -06:00