TraceLogger now tracks monotonic start times for starter actions
(web_search, fetch_url, synthesis_start, start) and attaches a
duration_ms field to the matching completer (web_search_complete,
fetch_url_complete, synthesis_complete, synthesis_error). The
terminal 'complete' step gets total_duration_sec instead.
Pairings are tightly sequential in the agent code (each
_execute_tool call runs start→end before returning), so a simple
dict keyed by starter name suffices — no queueing needed. An
unpaired completer leaves duration unset and does not crash.
Durations flow into both the JSONL trace and the structlog
operational log, so OpenSearch queries can filter / aggregate
by step latency without cross-row joins.
Verified end-to-end on a real shallow query:
web_search 5,233 ms
web_search 3,006 ms
synthesis_complete 27,658 ms
complete 47.547 s total
Synthesis is by far the slowest step — visible at a glance
for the first time.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>