The loop previously checked the token budget at the *bottom* of each iteration, after the LLM call and tool work had already happened. By the time the cap was caught the budget had been exceeded and the overshoot was unbounded by the iteration's cost. Move the check to the *top* of the loop so a new iteration is never started past the budget. Document the policy explicitly: token_budget is a soft cap on the tool-use loop only; the synthesis call is always allowed to complete so callers get a structured ResearchResult rather than a fallback stub. Capping synthesis is a separate, larger design question (would require splitting the budget between loop and synthesis up-front). Verified: token_budget=5000, max_iterations=10 now stops after 2 iterations with budget_exhausted=True and a complete answer with 10 citations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| __main__.py | ||
| agent.py | ||
| models.py | ||
| server.py | ||
| tools.py | ||
| trace.py | ||