Dir loop message history grows monotonically — no eviction #51

New issue

Open

opened 2026-04-06 22:46:04 -06:00 by archeious · 0 comments

archeious commented

2026-04-06 22:46:04 -06:00

Owner

Background

#44 fixes the broken context-budget metric (cumulative sum vs actual context size). With that fix, the budget no longer trips falsely. But the underlying dynamic that would eventually trip a true budget remains: dir loop message history grows monotonically with no eviction.

The growth pattern

Every turn appends:

The assistant's content blocks (text + tool_use)
A user message containing every tool_result

Nothing is ever removed. Per-turn input_tokens from the #44 verification run on luminos_lib:

turn  input_tokens
1     2,654
2     4,041
3     4,308
4     7,370
5    12,793
6    13,988
7    14,853
8    15,697
9    18,460
10   20,284
11   20,535

Roughly linear growth at ~1.5–2k per turn. This stays well under Sonnet's 200k window for the current max_turns=14 cap, so the metric fix in #44 is sufficient for now. But:

If max_turns is raised (Phase 3 dynamic turn allocation could push it higher)
If a single tool result is large (a parse_structure on a big file, or a read_file with large max_bytes)
If a target has many files and the agent tries to read them all in one loop

...the growth becomes a real overflow risk.

Why eviction is non-trivial

Tool results can't just be deleted — the agent reasons about what it has already learned by referring back to earlier results. Naive eviction would break the agent's working memory.

Possible directions

Compress old tool results. After a tool result is N turns old, replace it with a one-line placeholder: [result elided — N chars, was: read_file(path='ai.py')]. The fact of the call survives; the bytes don't.
Summarize and squash. When history exceeds a threshold, run a one-shot summarization call that produces a "what I've learned so far" paragraph and replaces the old turns with that paragraph. Costs an API call per squash but bounds growth.
Cache-as-memory pattern. Force the agent to write_cache after every tool result, then evict the result from history immediately — the agent reads from cache instead of scrolling back. Requires prompt changes to make this discipline natural.
Per-tool output caps. Cap parse_structure / read_file output sizes more aggressively. Doesn't solve growth but reduces per-turn delta.
Sliding window. Keep the last K turns in full, drop older turns entirely. Brittle — drops information the agent might still need.

Direction 3 (cache-as-memory) is the most aligned with how Luminos already works — the cache is already the persistent memory layer, the message history is just scratch space. Worth designing toward.

Acceptance

A dir loop running for max_turns turns on a target with many files keeps per-turn input_tokens bounded (does not grow linearly to overflow)
The agent's investigation quality is not measurably degraded by the eviction strategy
Token cost per dir loop drops noticeably (linear-growth contexts are quadratic in cost across turns)

Sequencing

Not blocking. After Phase 3 (planning) ships and we have data on how dynamic turn allocation interacts with the growth pattern. If Phase 3 raises max_turns significantly, this becomes blocking before Phase 4.

## Background #44 fixes the broken context-budget metric (cumulative sum vs actual context size). With that fix, the budget no longer trips falsely. But the underlying dynamic that *would* eventually trip a true budget remains: dir loop message history grows monotonically with no eviction. ## The growth pattern Every turn appends: - The assistant's content blocks (text + tool_use) - A user message containing every tool_result Nothing is ever removed. Per-turn `input_tokens` from the #44 verification run on `luminos_lib`: ``` turn input_tokens 1 2,654 2 4,041 3 4,308 4 7,370 5 12,793 6 13,988 7 14,853 8 15,697 9 18,460 10 20,284 11 20,535 ``` Roughly linear growth at ~1.5–2k per turn. This stays well under Sonnet's 200k window for the current `max_turns=14` cap, so the metric fix in #44 is sufficient for now. But: 1. If `max_turns` is raised (Phase 3 dynamic turn allocation could push it higher) 2. If a single tool result is large (a `parse_structure` on a big file, or a `read_file` with large `max_bytes`) 3. If a target has many files and the agent tries to read them all in one loop ...the growth becomes a real overflow risk. ## Why eviction is non-trivial Tool results can't just be deleted — the agent reasons about what it has already learned by referring back to earlier results. Naive eviction would break the agent's working memory. ## Possible directions 1. **Compress old tool results.** After a tool result is N turns old, replace it with a one-line placeholder: `[result elided — N chars, was: read_file(path='ai.py')]`. The fact of the call survives; the bytes don't. 2. **Summarize and squash.** When history exceeds a threshold, run a one-shot summarization call that produces a "what I've learned so far" paragraph and replaces the old turns with that paragraph. Costs an API call per squash but bounds growth. 3. **Cache-as-memory pattern.** Force the agent to write_cache after every tool result, then evict the result from history immediately — the agent reads from cache instead of scrolling back. Requires prompt changes to make this discipline natural. 4. **Per-tool output caps.** Cap parse_structure / read_file output sizes more aggressively. Doesn't solve growth but reduces per-turn delta. 5. **Sliding window.** Keep the last K turns in full, drop older turns entirely. Brittle — drops information the agent might still need. Direction 3 (cache-as-memory) is the most aligned with how Luminos already works — the cache is already the persistent memory layer, the message history is just scratch space. Worth designing toward. ## Acceptance - A dir loop running for `max_turns` turns on a target with many files keeps per-turn `input_tokens` bounded (does not grow linearly to overflow) - The agent's investigation quality is not measurably degraded by the eviction strategy - Token cost per dir loop drops noticeably (linear-growth contexts are quadratic in cost across turns) ## Sequencing Not blocking. After Phase 3 (planning) ships and we have data on how dynamic turn allocation interacts with the growth pattern. If Phase 3 raises `max_turns` significantly, this becomes blocking before Phase 4.