Budget cap lags one iteration behind tool payload growth #53

New issue

Open

opened 2026-04-08 19:12:50 -06:00 by claude-code · 0 comments

claude-code commented

2026-04-08 19:12:50 -06:00

Collaborator

Discovered during M3.1 stress testing (Issue #44, Q4).

Symptom

Ran ask "Comprehensive history of AI 1950 to 2026" --budget 5000 --max-iterations 2. Result:

Tokens consumed in research loop: 10606 (2.1x over 5000 cap)
Total with synthesis: 29304
budget_exhausted: False
Budget status: under cap
No BUDGET_EXHAUSTED gap surfaced.

Trace 38235720-6efc-4d7d-b284-6e21b1c83d46:

iteration_start  tokens_so_far=0
iteration_start  tokens_so_far=1145
synthesis_start  tokens_used=10606

Root cause

researchers/web/agent.py:260-270. The loop checks total_tokens >= constraints.token_budget at the top of each iteration. total_tokens is incremented from response.usage after each model call, so it reflects the previous iteration's usage.

Iter 1's input is tiny (just the user question + system prompt) → ~1145 tokens. The check at the top of iter 2 sees 1145 < 5000, lets iter 2 run. Iter 2's model call has a huge input (all the fetched tool results from iter 1's tool calls), pushing total to 10606. The loop then exits naturally because iterations < max_iterations is False (2 < 2). The budget check never sees the inflated count.

For small max_iterations, the cap may never trip even when actual usage is multiples over budget.

Out of scope

Synthesis being uncapped is by design (agent.py:254-259) — not part of this bug.

Suggested fix

Either:

Estimate next iteration's input cost from current messages length before deciding to enter the next iteration, or
Track tool result payload sizes as they accumulate (count toward the soft cap immediately, not on next API call).

Option 1 is simpler and matches the "soft cap" semantic.

Acceptance

A query with --budget 5000 --max-iterations 2 that fetches large pages should set budget_exhausted=True and surface a BUDGET_EXHAUSTED gap.

Discovered during M3.1 stress testing (Issue #44, Q4). ## Symptom Ran `ask "Comprehensive history of AI 1950 to 2026" --budget 5000 --max-iterations 2`. Result: - Tokens consumed in research loop: **10606** (2.1x over 5000 cap) - Total with synthesis: **29304** - `budget_exhausted`: **False** - `Budget status: under cap` - No `BUDGET_EXHAUSTED` gap surfaced. Trace `38235720-6efc-4d7d-b284-6e21b1c83d46`: ``` iteration_start tokens_so_far=0 iteration_start tokens_so_far=1145 synthesis_start tokens_used=10606 ``` ## Root cause `researchers/web/agent.py:260-270`. The loop checks `total_tokens >= constraints.token_budget` at the top of each iteration. `total_tokens` is incremented from `response.usage` *after* each model call, so it reflects the **previous** iteration's usage. Iter 1's input is tiny (just the user question + system prompt) → ~1145 tokens. The check at the top of iter 2 sees 1145 < 5000, lets iter 2 run. Iter 2's model call has a huge input (all the fetched tool results from iter 1's tool calls), pushing total to 10606. The loop then exits naturally because `iterations < max_iterations` is False (2 < 2). The budget check never sees the inflated count. For small `max_iterations`, the cap may never trip even when actual usage is multiples over budget. ## Out of scope Synthesis being uncapped is by design (agent.py:254-259) — not part of this bug. ## Suggested fix Either: 1. Estimate next iteration's input cost from current `messages` length before deciding to enter the next iteration, or 2. Track tool result payload sizes as they accumulate (count toward the soft cap immediately, not on next API call). Option 1 is simpler and matches the "soft cap" semantic. ## Acceptance A query with `--budget 5000 --max-iterations 2` that fetches large pages should set `budget_exhausted=True` and surface a `BUDGET_EXHAUSTED` gap.