Bug: token_budget is not actually enforced #17

New issue

Closed

opened 2026-04-08 15:12:47 -06:00 by claude-code · 0 comments

claude-code commented

2026-04-08 15:12:47 -06:00

Collaborator

Discovered during M2.3 smoke test (trace 1a8711c4-a65b-49fd-853e-50fde79c755f).

Default token_budget is 20000, but the smoke run consumed 20,623 tokens before the agent caught the overshoot. Looking at the trace, the agent does check the budget — but only between iterations:

Step 14: iteration_start — tokens_so_far: 8268 (under budget, proceed)
Steps 15–20: iteration 3 makes three more web_search calls
Step 21: budget_exhausted — 20623/20000 (caught, but already over)

So enforcement is post-hoc within an iteration. By the time the check runs, the iteration's LLM calls have already happened. The 623-token overshoot is small here, but a more expensive iteration could blow past the cap by much more.

(Note: my earlier report incorrectly cited 48,521 tokens — I was reading the cli.main render output, which appears to include numbers from a different run. The trace is authoritative: 20,623.)

Suggested fix

Check the budget before each LLM call inside an iteration, not just at iteration boundaries
Or: estimate the iteration's max cost up-front and skip if it would push over the cap
Consider a hard cap (kill the call) vs soft cap (let in-flight work finish, no new work) — pick one and document it

Discovered during M2.3 smoke test (trace `1a8711c4-a65b-49fd-853e-50fde79c755f`). Default `token_budget` is 20000, but the smoke run consumed **20,623 tokens** before the agent caught the overshoot. Looking at the trace, the agent does check the budget — but only *between* iterations: - Step 14: `iteration_start` — `tokens_so_far: 8268` (under budget, proceed) - Steps 15–20: iteration 3 makes three more `web_search` calls - Step 21: `budget_exhausted` — `20623/20000` (caught, but already over) So enforcement is post-hoc within an iteration. By the time the check runs, the iteration's LLM calls have already happened. The 623-token overshoot is small here, but a more expensive iteration could blow past the cap by much more. (Note: my earlier report incorrectly cited 48,521 tokens — I was reading the cli.main render output, which appears to include numbers from a different run. The trace is authoritative: 20,623.) ## Suggested fix - Check the budget *before* each LLM call inside an iteration, not just at iteration boundaries - Or: estimate the iteration's max cost up-front and skip if it would push over the cap - Consider a hard cap (kill the call) vs soft cap (let in-flight work finish, no new work) — pick one and document it