M2.5.2: Cost ledger with price table #28

Merged

archeious merged 1 commit from feat/cost-ledger into main

2026-04-08 15:54:24 -06:00

claude-code commented

2026-04-08 15:52:40 -06:00

Collaborator

Closes Issue #25

Summary

Append-only JSONL ledger of every research() call at ~/.marchwarden/costs.jsonl. Supplements (does not replace) the per-call cost_metadata field on ResearchResult. Operators query it via the upcoming marchwarden costs command (M2.5.3 / Issue #26).

Ledger entry fields

timestamp, trace_id, question (truncated to 200 chars), model_id, tokens_used, tokens_input, tokens_output, iterations_run, wall_time_sec, tavily_searches, estimated_cost_usd, budget_exhausted, confidence.

Price table

TOML at ~/.marchwarden/prices.toml, auto-seeded with current Anthropic (Sonnet 4.6, Opus 4.6, Haiku 4.5) + Tavily rates on first run
Existing files are never overwritten
Unknown models log a WARN and record estimated_cost_usd: null instead of crashing
Operators update rates manually — no automatic fetching

Integration

Each ledger write also emits a structured cost_recorded log line via the M2.5.1 logger, so cost data ships to OpenSearch alongside the file
agent.py now tracks tokens_input / tokens_output separately (not just total) and counts tavily_searches across iterations
_synthesize returns (result, synth_in, synth_out) so the caller can attribute synthesis tokens to the running counters
Ledger write failures are caught and warn-logged — a broken ledger can never poison a successful research call

End-to-end verified

Real Anthropic+Tavily call from inside the docker test env:

{
  "trace_id": "69bc298b-d28d-4634-8f43-bd35513b7d50",
  "question": "What is the capital of Utah?",
  "model_id": "claude-sonnet-4-6",
  "tokens_used": 10247,
  "tokens_input": 9107,
  "tokens_output": 1140,
  "iterations_run": 2,
  "wall_time_sec": 25.285,
  "tavily_searches": 1,
  "estimated_cost_usd": 0.049421,
  "budget_exhausted": false,
  "confidence": 0.99
}

Tests

10 new tests covering price table seeding, no-overwrite of existing files, cost estimation for known/unknown models, tavily-only cost, ledger appends, question truncation, and env var override. 104/104 passing.

Out of scope

marchwarden costs CLI command (M2.5.3 / Issue #26).

Closes Issue #25 ## Summary Append-only JSONL ledger of every `research()` call at `~/.marchwarden/costs.jsonl`. Supplements (does not replace) the per-call `cost_metadata` field on `ResearchResult`. Operators query it via the upcoming `marchwarden costs` command (M2.5.3 / Issue #26). ## Ledger entry fields `timestamp`, `trace_id`, `question` (truncated to 200 chars), `model_id`, `tokens_used`, `tokens_input`, `tokens_output`, `iterations_run`, `wall_time_sec`, `tavily_searches`, `estimated_cost_usd`, `budget_exhausted`, `confidence`. ## Price table - TOML at `~/.marchwarden/prices.toml`, auto-seeded with current Anthropic (Sonnet 4.6, Opus 4.6, Haiku 4.5) + Tavily rates on first run - Existing files are never overwritten - Unknown models log a WARN and record `estimated_cost_usd: null` instead of crashing - Operators update rates manually — no automatic fetching ## Integration - Each ledger write also emits a structured `cost_recorded` log line via the M2.5.1 logger, so cost data ships to OpenSearch alongside the file - `agent.py` now tracks `tokens_input` / `tokens_output` separately (not just total) and counts `tavily_searches` across iterations - `_synthesize` returns `(result, synth_in, synth_out)` so the caller can attribute synthesis tokens to the running counters - Ledger write failures are caught and warn-logged — a broken ledger can never poison a successful research call ## End-to-end verified Real Anthropic+Tavily call from inside the docker test env: ```json { "trace_id": "69bc298b-d28d-4634-8f43-bd35513b7d50", "question": "What is the capital of Utah?", "model_id": "claude-sonnet-4-6", "tokens_used": 10247, "tokens_input": 9107, "tokens_output": 1140, "iterations_run": 2, "wall_time_sec": 25.285, "tavily_searches": 1, "estimated_cost_usd": 0.049421, "budget_exhausted": false, "confidence": 0.99 } ``` ## Tests 10 new tests covering price table seeding, no-overwrite of existing files, cost estimation for known/unknown models, tavily-only cost, ledger appends, question truncation, and env var override. **104/104 passing.** ## Out of scope `marchwarden costs` CLI command (M2.5.3 / Issue #26).

claude-code added 1 commit 2026-04-08 15:52:41 -06:00

M2.5.2: Cost ledger with price table (#25 ) 0d957336f5

Adds an append-only JSONL ledger of every research() call at
~/.marchwarden/costs.jsonl, supplementing (not replacing) the
per-call cost_metadata field returned to callers. The ledger is
the operator-facing source of truth for spend tracking, queryable
via the upcoming `marchwarden costs` command (M2.5.3).

Fields per entry: timestamp, trace_id, question (truncated 200ch),
model_id, tokens_used, tokens_input, tokens_output, iterations_run,
wall_time_sec, tavily_searches, estimated_cost_usd, budget_exhausted,
confidence.

Cost estimation reads ~/.marchwarden/prices.toml, which is
auto-created with seed values for current Anthropic + Tavily rates
on first run. Operators are expected to update prices.toml
manually when upstream rates change — there is no automatic
fetching. Existing files are never overwritten. Unknown models
log a WARN and record estimated_cost_usd: null instead of
crashing.

Each ledger write also emits a structured `cost_recorded` log line
via the M2.5.1 logger, so cost data ships to OpenSearch alongside
the ledger file with no extra plumbing.

Tracking changes in agent.py:
- Track tokens_input / tokens_output split (not just total)
- Count tavily_searches across iterations
- _synthesize now returns (result, synth_in, synth_out) so the
  caller can attribute synthesis tokens to the running counters
- Ledger.record() called after research_completed log; failures
  are caught and warn-logged so a ledger write can never poison
  a successful research call

Tests cover: price table seeding, no-overwrite of existing files,
cost estimation for known/unknown models, tavily-only cost,
ledger appends, question truncation, env var override.
End-to-end verified with a real Anthropic+Tavily call:
9107 input + 1140 output tokens, 1 tavily search, $0.049 estimated.

104/104 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>