M2.5.2: Cost ledger with price table #28

Merged
archeious merged 1 commit from feat/cost-ledger into main 2026-04-08 21:54:24 +00:00
Collaborator

Closes Issue #25

Summary

Append-only JSONL ledger of every research() call at ~/.marchwarden/costs.jsonl. Supplements (does not replace) the per-call cost_metadata field on ResearchResult. Operators query it via the upcoming marchwarden costs command (M2.5.3 / Issue #26).

Ledger entry fields

timestamp, trace_id, question (truncated to 200 chars), model_id, tokens_used, tokens_input, tokens_output, iterations_run, wall_time_sec, tavily_searches, estimated_cost_usd, budget_exhausted, confidence.

Price table

  • TOML at ~/.marchwarden/prices.toml, auto-seeded with current Anthropic (Sonnet 4.6, Opus 4.6, Haiku 4.5) + Tavily rates on first run
  • Existing files are never overwritten
  • Unknown models log a WARN and record estimated_cost_usd: null instead of crashing
  • Operators update rates manually — no automatic fetching

Integration

  • Each ledger write also emits a structured cost_recorded log line via the M2.5.1 logger, so cost data ships to OpenSearch alongside the file
  • agent.py now tracks tokens_input / tokens_output separately (not just total) and counts tavily_searches across iterations
  • _synthesize returns (result, synth_in, synth_out) so the caller can attribute synthesis tokens to the running counters
  • Ledger write failures are caught and warn-logged — a broken ledger can never poison a successful research call

End-to-end verified

Real Anthropic+Tavily call from inside the docker test env:

{
  "trace_id": "69bc298b-d28d-4634-8f43-bd35513b7d50",
  "question": "What is the capital of Utah?",
  "model_id": "claude-sonnet-4-6",
  "tokens_used": 10247,
  "tokens_input": 9107,
  "tokens_output": 1140,
  "iterations_run": 2,
  "wall_time_sec": 25.285,
  "tavily_searches": 1,
  "estimated_cost_usd": 0.049421,
  "budget_exhausted": false,
  "confidence": 0.99
}

Tests

10 new tests covering price table seeding, no-overwrite of existing files, cost estimation for known/unknown models, tavily-only cost, ledger appends, question truncation, and env var override. 104/104 passing.

Out of scope

marchwarden costs CLI command (M2.5.3 / Issue #26).

Closes Issue #25 ## Summary Append-only JSONL ledger of every `research()` call at `~/.marchwarden/costs.jsonl`. Supplements (does not replace) the per-call `cost_metadata` field on `ResearchResult`. Operators query it via the upcoming `marchwarden costs` command (M2.5.3 / Issue #26). ## Ledger entry fields `timestamp`, `trace_id`, `question` (truncated to 200 chars), `model_id`, `tokens_used`, `tokens_input`, `tokens_output`, `iterations_run`, `wall_time_sec`, `tavily_searches`, `estimated_cost_usd`, `budget_exhausted`, `confidence`. ## Price table - TOML at `~/.marchwarden/prices.toml`, auto-seeded with current Anthropic (Sonnet 4.6, Opus 4.6, Haiku 4.5) + Tavily rates on first run - Existing files are never overwritten - Unknown models log a WARN and record `estimated_cost_usd: null` instead of crashing - Operators update rates manually — no automatic fetching ## Integration - Each ledger write also emits a structured `cost_recorded` log line via the M2.5.1 logger, so cost data ships to OpenSearch alongside the file - `agent.py` now tracks `tokens_input` / `tokens_output` separately (not just total) and counts `tavily_searches` across iterations - `_synthesize` returns `(result, synth_in, synth_out)` so the caller can attribute synthesis tokens to the running counters - Ledger write failures are caught and warn-logged — a broken ledger can never poison a successful research call ## End-to-end verified Real Anthropic+Tavily call from inside the docker test env: ```json { "trace_id": "69bc298b-d28d-4634-8f43-bd35513b7d50", "question": "What is the capital of Utah?", "model_id": "claude-sonnet-4-6", "tokens_used": 10247, "tokens_input": 9107, "tokens_output": 1140, "iterations_run": 2, "wall_time_sec": 25.285, "tavily_searches": 1, "estimated_cost_usd": 0.049421, "budget_exhausted": false, "confidence": 0.99 } ``` ## Tests 10 new tests covering price table seeding, no-overwrite of existing files, cost estimation for known/unknown models, tavily-only cost, ledger appends, question truncation, and env var override. **104/104 passing.** ## Out of scope `marchwarden costs` CLI command (M2.5.3 / Issue #26).
claude-code added 1 commit 2026-04-08 21:52:41 +00:00
Adds an append-only JSONL ledger of every research() call at
~/.marchwarden/costs.jsonl, supplementing (not replacing) the
per-call cost_metadata field returned to callers. The ledger is
the operator-facing source of truth for spend tracking, queryable
via the upcoming `marchwarden costs` command (M2.5.3).

Fields per entry: timestamp, trace_id, question (truncated 200ch),
model_id, tokens_used, tokens_input, tokens_output, iterations_run,
wall_time_sec, tavily_searches, estimated_cost_usd, budget_exhausted,
confidence.

Cost estimation reads ~/.marchwarden/prices.toml, which is
auto-created with seed values for current Anthropic + Tavily rates
on first run. Operators are expected to update prices.toml
manually when upstream rates change — there is no automatic
fetching. Existing files are never overwritten. Unknown models
log a WARN and record estimated_cost_usd: null instead of
crashing.

Each ledger write also emits a structured `cost_recorded` log line
via the M2.5.1 logger, so cost data ships to OpenSearch alongside
the ledger file with no extra plumbing.

Tracking changes in agent.py:
- Track tokens_input / tokens_output split (not just total)
- Count tavily_searches across iterations
- _synthesize now returns (result, synth_in, synth_out) so the
  caller can attribute synthesis tokens to the running counters
- Ledger.record() called after research_completed log; failures
  are caught and warn-logged so a ledger write can never poison
  a successful research call

Tests cover: price table seeding, no-overwrite of existing files,
cost estimation for known/unknown models, tavily-only cost,
ledger appends, question truncation, env var override.
End-to-end verified with a real Anthropic+Tavily call:
9107 input + 1140 output tokens, 1 tavily search, $0.049 estimated.

104/104 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
archeious merged commit 5a0ca73e2a into main 2026-04-08 21:54:24 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: archeious/marchwarden#28
No description provided.