marchwarden/researchers/web
Jeff Smith 1203b07248 fix(observability): persist full ResearchResult and per-item trace events
Closes #54.

The JSONL trace previously stored only counts on the `complete` event
(gap_count, citation_count, discovery_count). Replay could re-render the
step log but could not recover which gaps fired or which sources were
cited, blocking M3.2/M3.3 stress-testing and calibration work.

Two complementary fixes:

1. (a) TraceLogger.write_result() dumps the pydantic ResearchResult to
   `<trace_id>.result.json` next to the JSONL trace. The agent calls it
   right before emitting the `complete` step. `cli replay` now loads the
   sibling result file when present and renders the structured tables
   under the trace step log.

2. (b) The agent emits one `gap_recorded`, `citation_recorded`, or
   `discovery_recorded` trace event per item from the final result. This
   gives the JSONL stream a queryable timeline of what was kept, with
   categories and topics in-band, without needing to load the result
   sibling.

Tests: 4 added (127 total passing). Smoke-tested live with a real ask;
both files written and replay rendering verified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:27:33 -06:00
..
__init__.py Initial project structure and scaffolding 2026-04-08 11:57:15 -06:00
__main__.py M1.4: MCP server wrapping web researcher 2026-04-08 14:41:13 -06:00
agent.py fix(observability): persist full ResearchResult and per-item trace events 2026-04-08 19:27:33 -06:00
models.py depth flag now drives constraint defaults (#30) 2026-04-08 16:27:38 -06:00
server.py depth flag now drives constraint defaults (#30) 2026-04-08 16:27:38 -06:00
tools.py M1.1: Search and fetch tools with tests 2026-04-08 14:17:18 -06:00
trace.py fix(observability): persist full ResearchResult and per-item trace events 2026-04-08 19:27:33 -06:00