Dir loop exhausts context budget on small targets

archeious commented

2026-04-06 21:58:50 -06:00

Owner

Observation

Smoke testing #5 on the luminos_lib directory (13 Python source files, ~2800 LOC, ~200KB on disk) the dir loop hit the context budget early-exit and bailed:

Context budget reached — exiting early (133,474 tokens used)

MAX_CONTEXT = 180_000 and CONTEXT_BUDGET = 0.70 * MAX_CONTEXT = 126_000. The agent burned through that on a single small Python library — well before finishing investigation. Final cost for the run: ~$0.46.

Why this is surprising

13 source files totaling 2800 LOC is a small target. Reading every file in full would be on the order of 30k–50k tokens. The agent should have plenty of headroom. The fact that 133k was consumed suggests one or more of:

Tool result re-injection — every tool_result block stays in the message history for the rest of the loop, so a read_file early in the loop continues to consume tokens on every subsequent turn
Verbose parse_structure output — the agent called parse_structure on all 13 files in a single turn; structured AST output may be much larger than expected
Redundant reads — agent called read_file("ai.py", max_bytes=8192) then later read_file("ai.py", max_bytes=16384), doubling the content in context
Large system prompt — _DIR_SYSTEM_PROMPT plus survey injection (after #6) plus child summaries grows non-trivially

Why this matters

If a 13-file Python library exhausts the budget, then realistic targets (hundreds of files, multi-language, deep trees) will be unusable without intervention. The current behavior — early exit with whatever was cached so far — produces an incomplete report and a confusing user experience.

Possible directions (not a decision)

Trim tool results from history after they have been consumed N turns ago, replacing them with a [result elided] placeholder
Cap parse_structure output size explicitly and force the agent to use targeted queries
Per-file budget rather than per-loop — fail individual file reads instead of the whole loop
Streaming summarization — periodically compress old turns into a shorter narrative
Raise MAX_CONTEXT toward the model's real limit (200k for Sonnet, 1M for Sonnet 4.6) — cheapest but does not address the root cause

Acceptance

A run of --ai luminos_lib completes without hitting the context budget
A run on a target 5x larger still exits gracefully (early exit is acceptable, but should not silently truncate the report)
Token usage instrumentation makes it clear WHERE the tokens went (which tool calls, which files), so future regressions are visible

Sequencing

This issue should be triaged after Phase 2 ships (#4–#7 plus #42), before any phase that adds more tools or longer prompts (Phase 3 planning, Phase 4 external knowledge tools). Adding more capability before fixing the budget will only make this worse.

Reproduction

python3 luminos.py --ai luminos_lib

Look for: Context budget reached — exiting early

## Observation Smoke testing #5 on the `luminos_lib` directory (13 Python source files, ~2800 LOC, ~200KB on disk) the dir loop hit the context budget early-exit and bailed: ``` Context budget reached — exiting early (133,474 tokens used) ``` `MAX_CONTEXT = 180_000` and `CONTEXT_BUDGET = 0.70 * MAX_CONTEXT = 126_000`. The agent burned through that on a single small Python library — well before finishing investigation. Final cost for the run: ~$0.46. ## Why this is surprising 13 source files totaling 2800 LOC is a small target. Reading every file in full would be on the order of 30k–50k tokens. The agent should have plenty of headroom. The fact that 133k was consumed suggests one or more of: - **Tool result re-injection** — every `tool_result` block stays in the message history for the rest of the loop, so a `read_file` early in the loop continues to consume tokens on every subsequent turn - **Verbose `parse_structure` output** — the agent called `parse_structure` on all 13 files in a single turn; structured AST output may be much larger than expected - **Redundant reads** — agent called `read_file("ai.py", max_bytes=8192)` then later `read_file("ai.py", max_bytes=16384)`, doubling the content in context - **Large system prompt** — `_DIR_SYSTEM_PROMPT` plus survey injection (after #6) plus child summaries grows non-trivially ## Why this matters If a 13-file Python library exhausts the budget, then realistic targets (hundreds of files, multi-language, deep trees) will be unusable without intervention. The current behavior — early exit with whatever was cached so far — produces an incomplete report and a confusing user experience. ## Possible directions (not a decision) 1. **Trim tool results from history** after they have been consumed N turns ago, replacing them with a `[result elided]` placeholder 2. **Cap `parse_structure` output size** explicitly and force the agent to use targeted queries 3. **Per-file budget** rather than per-loop — fail individual file reads instead of the whole loop 4. **Streaming summarization** — periodically compress old turns into a shorter narrative 5. **Raise `MAX_CONTEXT`** toward the model's real limit (200k for Sonnet, 1M for Sonnet 4.6) — cheapest but does not address the root cause ## Acceptance - A run of `--ai luminos_lib` completes without hitting the context budget - A run on a target 5x larger still exits gracefully (early exit is acceptable, but should not silently truncate the report) - Token usage instrumentation makes it clear WHERE the tokens went (which tool calls, which files), so future regressions are visible ## Sequencing This issue should be triaged after Phase 2 ships (#4–#7 plus #42), before any phase that adds more tools or longer prompts (Phase 3 planning, Phase 4 external knowledge tools). Adding more capability before fixing the budget will only make this worse. ## Reproduction ```bash python3 luminos.py --ai luminos_lib ``` Look for: `Context budget reached — exiting early`