fix(ai): correct context budget metric (#44) #52

Closed

archeious wants to merge 0 commits from feat/issue-44-context-budget into main

archeious commented

2026-04-06 22:49:43 -06:00

Owner

Closes #44

The budget metric was a cumulative sum of per-call input_tokens, which double-counts every turn. Fixed to use the latest call input_tokens (the actual context size). Verified empirically: the loop was bailing when real context was at 20k of a 200k window. MAX_CONTEXT bumped to 200k (Sonnet 4 real). #51 filed for the separate message-history-growth issue.

Closes #44 The budget metric was a cumulative sum of per-call input_tokens, which double-counts every turn. Fixed to use the latest call input_tokens (the actual context size). Verified empirically: the loop was bailing when real context was at 20k of a 200k window. MAX_CONTEXT bumped to 200k (Sonnet 4 real). #51 filed for the separate message-history-growth issue.

archeious added 1 commit 2026-04-06 22:49:43 -06:00

fix(ai): correct context budget metric — track per-call, not sum (#44 ) 036c3a934a

The dir loop was exiting early on small targets (a 13-file Python lib
hit the budget at 92k–139k cumulative tokens) because _TokenTracker
compared the SUM of input_tokens across all turns to the context
window size. input_tokens from each API response is the size of the
full prompt sent on that turn (system + every prior message + new
tool results), so summing across turns multi-counts everything. The
real per-call context size never approached the limit.

Verified empirically: on luminos_lib pre-fix, the loop bailed when
the most recent call's input_tokens was 20,535 (~10% of Sonnet's
200k window) but the cumulative sum was 134,983.

Changes:
- _TokenTracker now tracks last_input (the most recent call's
  input_tokens), separate from the cumulative loop_input/total_input
  used for cost reporting.
- budget_exceeded() returns last_input > CONTEXT_BUDGET, not the
  cumulative sum.
- MAX_CONTEXT bumped from 180_000 to 200_000 (Sonnet 4's real
  context window). CONTEXT_BUDGET stays at 70% = 140,000.
- Early-exit message now shows context size, threshold, AND
  cumulative spend separately so future debugging is unambiguous.

Smoke test on luminos_lib: investigation completes without early
exit (~$0.37). 6 unit tests added covering the new semantics,
including the key regression: a sequence of small calls whose sum
exceeds the budget must NOT trip the check.

Wiki Architecture page updated.

#51 filed for the separate message-history-growth issue.