luminos/tests
Jeff Smith 036c3a934a fix(ai): correct context budget metric — track per-call, not sum (#44)
The dir loop was exiting early on small targets (a 13-file Python lib
hit the budget at 92k–139k cumulative tokens) because _TokenTracker
compared the SUM of input_tokens across all turns to the context
window size. input_tokens from each API response is the size of the
full prompt sent on that turn (system + every prior message + new
tool results), so summing across turns multi-counts everything. The
real per-call context size never approached the limit.

Verified empirically: on luminos_lib pre-fix, the loop bailed when
the most recent call's input_tokens was 20,535 (~10% of Sonnet's
200k window) but the cumulative sum was 134,983.

Changes:
- _TokenTracker now tracks last_input (the most recent call's
  input_tokens), separate from the cumulative loop_input/total_input
  used for cost reporting.
- budget_exceeded() returns last_input > CONTEXT_BUDGET, not the
  cumulative sum.
- MAX_CONTEXT bumped from 180_000 to 200_000 (Sonnet 4's real
  context window). CONTEXT_BUDGET stays at 70% = 140,000.
- Early-exit message now shows context size, threshold, AND
  cumulative spend separately so future debugging is unambiguous.

Smoke test on luminos_lib: investigation completes without early
exit (~$0.37). 6 unit tests added covering the new semantics,
including the key regression: a sequence of small calls whose sum
exceeds the budget must NOT trip the check.

Wiki Architecture page updated.

#51 filed for the separate message-history-growth issue.
2026-04-06 22:49:25 -06:00
..
__init__.py feat(tests): add unit test coverage for all testable modules (#37) 2026-04-06 16:57:26 -06:00
test_ai_filter.py fix(ai): correct context budget metric — track per-call, not sum (#44) 2026-04-06 22:49:25 -06:00
test_cache.py feat(cache): add low_confidence_entries() query to CacheManager (#3) 2026-04-06 21:13:58 -06:00
test_capabilities.py feat(tests): add unit test coverage for all testable modules (#37) 2026-04-06 16:57:26 -06:00
test_code.py feat(tests): add unit test coverage for all testable modules (#37) 2026-04-06 16:57:26 -06:00
test_disk.py feat(tests): add unit test coverage for all testable modules (#37) 2026-04-06 16:57:26 -06:00
test_filetypes.py feat(filetypes): expose raw signals to survey, remove classifier bias (#42) 2026-04-06 22:36:14 -06:00
test_recency.py feat(tests): add unit test coverage for all testable modules (#37) 2026-04-06 16:57:26 -06:00
test_report.py feat(tests): add unit test coverage for all testable modules (#37) 2026-04-06 16:57:26 -06:00
test_tree.py feat(tests): add unit test coverage for all testable modules (#37) 2026-04-06 16:57:26 -06:00