fix(ai): correct context budget metric (#44) #52

Closed
archeious wants to merge 0 commits from feat/issue-44-context-budget into main
Owner

Closes #44

The budget metric was a cumulative sum of per-call input_tokens, which double-counts every turn. Fixed to use the latest call input_tokens (the actual context size). Verified empirically: the loop was bailing when real context was at 20k of a 200k window. MAX_CONTEXT bumped to 200k (Sonnet 4 real). #51 filed for the separate message-history-growth issue.

Closes #44 The budget metric was a cumulative sum of per-call input_tokens, which double-counts every turn. Fixed to use the latest call input_tokens (the actual context size). Verified empirically: the loop was bailing when real context was at 20k of a 200k window. MAX_CONTEXT bumped to 200k (Sonnet 4 real). #51 filed for the separate message-history-growth issue.
archeious added 1 commit 2026-04-07 04:49:43 +00:00
The dir loop was exiting early on small targets (a 13-file Python lib
hit the budget at 92k–139k cumulative tokens) because _TokenTracker
compared the SUM of input_tokens across all turns to the context
window size. input_tokens from each API response is the size of the
full prompt sent on that turn (system + every prior message + new
tool results), so summing across turns multi-counts everything. The
real per-call context size never approached the limit.

Verified empirically: on luminos_lib pre-fix, the loop bailed when
the most recent call's input_tokens was 20,535 (~10% of Sonnet's
200k window) but the cumulative sum was 134,983.

Changes:
- _TokenTracker now tracks last_input (the most recent call's
  input_tokens), separate from the cumulative loop_input/total_input
  used for cost reporting.
- budget_exceeded() returns last_input > CONTEXT_BUDGET, not the
  cumulative sum.
- MAX_CONTEXT bumped from 180_000 to 200_000 (Sonnet 4's real
  context window). CONTEXT_BUDGET stays at 70% = 140,000.
- Early-exit message now shows context size, threshold, AND
  cumulative spend separately so future debugging is unambiguous.

Smoke test on luminos_lib: investigation completes without early
exit (~$0.37). 6 unit tests added covering the new semantics,
including the key regression: a sequence of small calls whose sum
exceeds the budget must NOT trip the check.

Wiki Architecture page updated.

#51 filed for the separate message-history-growth issue.
archeious closed this pull request 2026-04-07 04:49:49 +00:00

Pull request closed

Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: archeious/luminos#52
No description provided.