marchwarden/tests
Jeff Smith ddaf7e85c3 Record per-step durations in trace and operational logs (#35)
TraceLogger now tracks monotonic start times for starter actions
(web_search, fetch_url, synthesis_start, start) and attaches a
duration_ms field to the matching completer (web_search_complete,
fetch_url_complete, synthesis_complete, synthesis_error). The
terminal 'complete' step gets total_duration_sec instead.

Pairings are tightly sequential in the agent code (each
_execute_tool call runs start→end before returning), so a simple
dict keyed by starter name suffices — no queueing needed. An
unpaired completer leaves duration unset and does not crash.

Durations flow into both the JSONL trace and the structlog
operational log, so OpenSearch queries can filter / aggregate
by step latency without cross-row joins.

Verified end-to-end on a real shallow query:
  web_search       5,233 ms
  web_search       3,006 ms
  synthesis_complete 27,658 ms
  complete         47.547 s total

Synthesis is by far the slowest step — visible at a glance
for the first time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:49:52 -06:00
..
__init__.py M0.3: Implement contract v1 Pydantic models with tests 2026-04-08 14:00:45 -06:00
test_agent.py Add OpenQuestion to research contract 2026-04-08 14:37:30 -06:00
test_cli.py M2.5.3: marchwarden costs CLI command (#26) 2026-04-08 15:57:39 -06:00
test_costs.py M2.5.2: Cost ledger with price table (#25) 2026-04-08 15:52:25 -06:00
test_models.py depth flag now drives constraint defaults (#30) 2026-04-08 16:27:38 -06:00
test_obs.py M2.5.1: Structured application logger via structlog (#24) 2026-04-08 15:46:51 -06:00
test_server.py M1.4: MCP server wrapping web researcher 2026-04-08 14:41:13 -06:00
test_tools.py M1.1: Search and fetch tools with tests 2026-04-08 14:17:18 -06:00
test_trace.py Record per-step durations in trace and operational logs (#35) 2026-04-08 16:49:52 -06:00