Commit graph

3 commits

Author SHA1 Message Date
Jeff Smith
ddaf7e85c3 Record per-step durations in trace and operational logs (#35)
TraceLogger now tracks monotonic start times for starter actions
(web_search, fetch_url, synthesis_start, start) and attaches a
duration_ms field to the matching completer (web_search_complete,
fetch_url_complete, synthesis_complete, synthesis_error). The
terminal 'complete' step gets total_duration_sec instead.

Pairings are tightly sequential in the agent code (each
_execute_tool call runs start→end before returning), so a simple
dict keyed by starter name suffices — no queueing needed. An
unpaired completer leaves duration unset and does not crash.

Durations flow into both the JSONL trace and the structlog
operational log, so OpenSearch queries can filter / aggregate
by step latency without cross-row joins.

Verified end-to-end on a real shallow query:
  web_search       5,233 ms
  web_search       3,006 ms
  synthesis_complete 27,658 ms
  complete         47.547 s total

Synthesis is by far the slowest step — visible at a glance
for the first time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:49:52 -06:00
Jeff Smith
b510902af3 Mirror trace steps to operational logger
The trace JSONL captures every step of a research call (search,
fetch, iteration boundaries, synthesis), but the structured
operational log only fired at research_started / research_completed,
giving administrators no real-time visibility into agent progress.

Have TraceLogger.log_step also emit a structlog event using the
same action name, fields, and step counter. trace_id and researcher
are already bound in contextvars by WebResearcher.research, so
every line carries them automatically — no plumbing needed.

Volume control: a curated set of milestone actions logs at INFO
(start, iteration_start, synthesis_start/complete/error, budget_-
exhausted, complete). Chatty per-tool actions (web_search,
fetch_url and their *_complete pairs) log at DEBUG. Default
MARCHWARDEN_LOG_LEVEL=INFO shows ~9 lines per call;
MARCHWARDEN_LOG_LEVEL=DEBUG shows everything.

This keeps dev stderr readable while making full step visibility
one env var away — and OpenSearch can ingest at DEBUG always.

Verified end-to-end: Utah peak query at INFO produces 9 milestone
log lines, at DEBUG produces 13.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:22:13 -06:00
Jeff Smith
cef08c8984 M1.2: Trace logger with tests
TraceLogger produces JSONL audit logs per research() call:
- One file per trace_id at ~/.marchwarden/traces/{trace_id}.jsonl
- Each line is a self-contained JSON object (step, action, timestamp, decision)
- Supports arbitrary kwargs (url, content_hash, query, etc.)
- Lazy file handle, flush after each write, context manager support
- read_entries() for replay and testing

15 tests: file creation, step counting, JSONL validity, kwargs,
timestamps, flush behavior, multiple independent traces.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:21:10 -06:00