Commit graph

11 commits

Author SHA1 Message Date
Jeff Smith
6fdf0e338a M2.5.3: marchwarden costs CLI command (#26)
Adds operator-facing `marchwarden costs` subcommand that reads the
JSONL ledger from M2.5.2 and pretty-prints a rich summary:

- Cost Summary panel: total calls, total spend, total tokens (input/
  output split), Tavily search count, warning for any calls with
  unknown model prices
- Per-Day table sorted by date
- Per-Model table sorted by model id
- Highest-Cost Call panel with trace_id and question

Flags:
  --since   ISO date or relative shorthand (7d, 24h, 2w, 1m)
  --until   same
  --model   filter to a specific model_id
  --json    emit raw filtered ledger entries instead of the table
  --ledger  override default path (mostly for tests)

Also fixes a Dockerfile gap: the obs/ package added in M2.5.1 was
not being COPYed into the image, so the installed `marchwarden`
entry point couldn't import it. Tests had been passing because
they mounted /app over the install. Adding `COPY obs ./obs`
restores parity.

Tests cover summary rendering, model filter, since-date filter,
JSON output, and the empty-ledger friendly path. 110/110 passing.
End-to-end verified against the real cost ledger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:57:39 -06:00
Jeff Smith
0d957336f5 M2.5.2: Cost ledger with price table (#25)
Adds an append-only JSONL ledger of every research() call at
~/.marchwarden/costs.jsonl, supplementing (not replacing) the
per-call cost_metadata field returned to callers. The ledger is
the operator-facing source of truth for spend tracking, queryable
via the upcoming `marchwarden costs` command (M2.5.3).

Fields per entry: timestamp, trace_id, question (truncated 200ch),
model_id, tokens_used, tokens_input, tokens_output, iterations_run,
wall_time_sec, tavily_searches, estimated_cost_usd, budget_exhausted,
confidence.

Cost estimation reads ~/.marchwarden/prices.toml, which is
auto-created with seed values for current Anthropic + Tavily rates
on first run. Operators are expected to update prices.toml
manually when upstream rates change — there is no automatic
fetching. Existing files are never overwritten. Unknown models
log a WARN and record estimated_cost_usd: null instead of
crashing.

Each ledger write also emits a structured `cost_recorded` log line
via the M2.5.1 logger, so cost data ships to OpenSearch alongside
the ledger file with no extra plumbing.

Tracking changes in agent.py:
- Track tokens_input / tokens_output split (not just total)
- Count tavily_searches across iterations
- _synthesize now returns (result, synth_in, synth_out) so the
  caller can attribute synthesis tokens to the running counters
- Ledger.record() called after research_completed log; failures
  are caught and warn-logged so a ledger write can never poison
  a successful research call

Tests cover: price table seeding, no-overwrite of existing files,
cost estimation for known/unknown models, tavily-only cost,
ledger appends, question truncation, env var override.
End-to-end verified with a real Anthropic+Tavily call:
9107 input + 1140 output tokens, 1 tavily search, $0.049 estimated.

104/104 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:52:25 -06:00
Jeff Smith
8a62f6b014 M2.5.1: Structured application logger via structlog (#24)
Adds an operational logging layer separate from the JSONL trace
audit logs. Operational logs cover system events (startup, errors,
MCP transport, research lifecycle); JSONL traces remain the
researcher provenance audit trail.

Backend: structlog with two renderers selectable via
MARCHWARDEN_LOG_FORMAT (json|console). Defaults to console when
stderr is a TTY, json otherwise — so dev runs are human-readable
and shipped runs (containers, automation) emit OpenSearch-ready
JSON without configuration.

Key features:
- Named loggers per component: marchwarden.cli,
  marchwarden.mcp, marchwarden.researcher.web
- MARCHWARDEN_LOG_LEVEL controls global level (default INFO)
- MARCHWARDEN_LOG_FILE=1 enables a 10MB-rotating file at
  ~/.marchwarden/logs/marchwarden.log
- structlog contextvars bind trace_id + researcher at the start
  of each research() call so every downstream log line carries
  them automatically; cleared on completion
- stdlib logging is funneled through the same pipeline so noisy
  third-party loggers (httpx, anthropic) get the same formatting
  and quieted to WARN unless DEBUG is requested
- Logs to stderr to keep MCP stdio stdout clean

Wired into:
- cli.main.cli — configures logging on startup, logs ask_started/
  ask_completed/ask_failed
- researchers.web.server.main — configures logging on startup,
  logs mcp_server_starting
- researchers.web.agent.research — binds trace context, logs
  research_started/research_completed

Tests verify JSON and console formats, contextvar propagation,
level filtering, idempotency, and auto-configure-on-first-use.
94/94 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:46:51 -06:00
Jeff Smith
273d144381 M2.2: marchwarden replay CLI command (#9)
Adds `marchwarden replay <trace_id>` to pretty-print a prior research
run from its JSONL trace file. Resolves the trace under
~/.marchwarden/traces/ by default; --trace-dir overrides for tests and
custom locations. Renders each step as a row with action, decision,
extra fields, and content_hash. Friendly errors for unknown trace_id
and malformed JSON lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:57:37 -06:00
Jeff Smith
87a34c60d1 M2.1: marchwarden ask CLI command (#8)
Click app with `ask` subcommand that spawns the web researcher MCP
server over stdio, calls the research tool, and pretty-prints the
ResearchResult contract using rich (panels for answer/confidence/cost,
tables for citations, gaps, discovery events, and open questions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:51:40 -06:00
Jeff Smith
5d894d9e10 M1.4: MCP server wrapping web researcher
FastMCP server exposing a single 'research' tool:
- Delegates to WebResearcher with keys from ~/secrets
- Accepts question, context, depth, max_iterations, token_budget
- Returns full ResearchResult as JSON
- Configurable model via MARCHWARDEN_MODEL env var
- Runnable as: python -m researchers.web

4 tests: secret reading, JSON response validation, default parameters.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:41:13 -06:00
Jeff Smith
ae9c11a79b Add OpenQuestion to research contract
New field on ResearchResult: open_questions — follow-up questions that
emerged from the research itself. Distinct from gaps (backward: what
failed) and discovery_events (sideways: what's lateral). Open questions
look forward: 'based on what I found, this needs deeper investigation.'

- OpenQuestion model: question, context, priority (high/medium/low),
  source_locator
- Updated agent synthesis prompt to produce open_questions
- Updated agent result builder to parse open_questions from JSON
- 3 new tests for OpenQuestion model
- Updated existing tests for new field

77 tests passing.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:37:30 -06:00
Jeff Smith
7cb3fde90e M1.3: Inner agent loop with tests
WebResearcher — the core agentic research loop:
- Tool-use loop: Claude decides when to search (Tavily) and fetch (httpx)
- Budget enforcement: stops at max_iterations or token_budget
- Synthesis step: separate LLM call produces structured ResearchResult JSON
- Fallback: valid ResearchResult even when synthesis JSON is unparseable
- Full trace logging at every step (start, search, fetch, synthesis, complete)
- Populates all contract fields: raw_excerpt, categorized gaps,
  discovery_events, confidence_factors, cost_metadata with model_id

9 tests: complete research loop, budget exhaustion, synthesis failure
fallback, trace file creation, fetch_url tool integration, search
result formatting.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:29:27 -06:00
Jeff Smith
cef08c8984 M1.2: Trace logger with tests
TraceLogger produces JSONL audit logs per research() call:
- One file per trace_id at ~/.marchwarden/traces/{trace_id}.jsonl
- Each line is a self-contained JSON object (step, action, timestamp, decision)
- Supports arbitrary kwargs (url, content_hash, query, etc.)
- Lazy file handle, flush after each write, context manager support
- read_entries() for replay and testing

15 tests: file creation, step counting, JSONL validity, kwargs,
timestamps, flush behavior, multiple independent traces.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:21:10 -06:00
Jeff Smith
a5bc93e275 M1.1: Search and fetch tools with tests
- tavily_search(): Tavily API wrapper returning SearchResult dataclasses
  with content hashing (raw_content preferred, falls back to summary)
- fetch_url(): async URL fetch with HTML text extraction, content hashing,
  and graceful error handling (timeout, HTTP errors, connection errors)
- _extract_text(): simple HTML → clean text (strip scripts/styles/tags,
  decode entities, collapse whitespace)
- _sha256(): SHA-256 content hashing with 'sha256:' prefix for traces

18 tests: hashing, HTML extraction, mocked Tavily search, mocked async
fetch (success, timeout, HTTP error, hash consistency).

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:17:18 -06:00
Jeff Smith
1b0f86399a M0.3: Implement contract v1 Pydantic models with tests
All Research Contract types as Pydantic models:
- ResearchConstraints (input)
- Citation with raw_excerpt (output)
- GapCategory enum (5 categories)
- Gap with structured category (output)
- DiscoveryEvent (lateral findings)
- ConfidenceFactors (auditable scoring inputs)
- CostMetadata with model_id (resource tracking)
- ResearchResult (top-level contract)

32 tests: validation, bounds checking, serialization roundtrips,
JSON structure verification against contract spec.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:00:45 -06:00