The dir loop was exiting early on small targets (a 13-file Python lib
hit the budget at 92k–139k cumulative tokens) because _TokenTracker
compared the SUM of input_tokens across all turns to the context
window size. input_tokens from each API response is the size of the
full prompt sent on that turn (system + every prior message + new
tool results), so summing across turns multi-counts everything. The
real per-call context size never approached the limit.
Verified empirically: on luminos_lib pre-fix, the loop bailed when
the most recent call's input_tokens was 20,535 (~10% of Sonnet's
200k window) but the cumulative sum was 134,983.
Changes:
- _TokenTracker now tracks last_input (the most recent call's
input_tokens), separate from the cumulative loop_input/total_input
used for cost reporting.
- budget_exceeded() returns last_input > CONTEXT_BUDGET, not the
cumulative sum.
- MAX_CONTEXT bumped from 180_000 to 200_000 (Sonnet 4's real
context window). CONTEXT_BUDGET stays at 70% = 140,000.
- Early-exit message now shows context size, threshold, AND
cumulative spend separately so future debugging is unambiguous.
Smoke test on luminos_lib: investigation completes without early
exit (~$0.37). 6 unit tests added covering the new semantics,
including the key regression: a sequence of small calls whose sum
exceeds the budget must NOT trip the check.
Wiki Architecture page updated.
#51 filed for the separate message-history-growth issue.