Bug: token_budget is not actually enforced #17

Closed
opened 2026-04-08 21:12:47 +00:00 by claude-code · 0 comments
Collaborator

Discovered during M2.3 smoke test (trace 1a8711c4-a65b-49fd-853e-50fde79c755f).

Default token_budget is 20000, but the smoke run consumed 20,623 tokens before the agent caught the overshoot. Looking at the trace, the agent does check the budget — but only between iterations:

  • Step 14: iteration_starttokens_so_far: 8268 (under budget, proceed)
  • Steps 15–20: iteration 3 makes three more web_search calls
  • Step 21: budget_exhausted20623/20000 (caught, but already over)

So enforcement is post-hoc within an iteration. By the time the check runs, the iteration's LLM calls have already happened. The 623-token overshoot is small here, but a more expensive iteration could blow past the cap by much more.

(Note: my earlier report incorrectly cited 48,521 tokens — I was reading the cli.main render output, which appears to include numbers from a different run. The trace is authoritative: 20,623.)

Suggested fix

  • Check the budget before each LLM call inside an iteration, not just at iteration boundaries
  • Or: estimate the iteration's max cost up-front and skip if it would push over the cap
  • Consider a hard cap (kill the call) vs soft cap (let in-flight work finish, no new work) — pick one and document it
Discovered during M2.3 smoke test (trace `1a8711c4-a65b-49fd-853e-50fde79c755f`). Default `token_budget` is 20000, but the smoke run consumed **20,623 tokens** before the agent caught the overshoot. Looking at the trace, the agent does check the budget — but only *between* iterations: - Step 14: `iteration_start` — `tokens_so_far: 8268` (under budget, proceed) - Steps 15–20: iteration 3 makes three more `web_search` calls - Step 21: `budget_exhausted` — `20623/20000` (caught, but already over) So enforcement is post-hoc within an iteration. By the time the check runs, the iteration's LLM calls have already happened. The 623-token overshoot is small here, but a more expensive iteration could blow past the cap by much more. (Note: my earlier report incorrectly cited 48,521 tokens — I was reading the cli.main render output, which appears to include numbers from a different run. The trace is authoritative: 20,623.) ## Suggested fix - Check the budget *before* each LLM call inside an iteration, not just at iteration boundaries - Or: estimate the iteration's max cost up-front and skip if it would push over the cap - Consider a hard cap (kill the call) vs soft cap (let in-flight work finish, no new work) — pick one and document it
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: archeious/marchwarden#17
No description provided.