Dir loop message history grows monotonically — no eviction #51

Open
opened 2026-04-07 04:46:04 +00:00 by archeious · 0 comments
Owner

Background

#44 fixes the broken context-budget metric (cumulative sum vs actual context size). With that fix, the budget no longer trips falsely. But the underlying dynamic that would eventually trip a true budget remains: dir loop message history grows monotonically with no eviction.

The growth pattern

Every turn appends:

  • The assistant's content blocks (text + tool_use)
  • A user message containing every tool_result

Nothing is ever removed. Per-turn input_tokens from the #44 verification run on luminos_lib:

turn  input_tokens
1     2,654
2     4,041
3     4,308
4     7,370
5    12,793
6    13,988
7    14,853
8    15,697
9    18,460
10   20,284
11   20,535

Roughly linear growth at ~1.5–2k per turn. This stays well under Sonnet's 200k window for the current max_turns=14 cap, so the metric fix in #44 is sufficient for now. But:

  1. If max_turns is raised (Phase 3 dynamic turn allocation could push it higher)
  2. If a single tool result is large (a parse_structure on a big file, or a read_file with large max_bytes)
  3. If a target has many files and the agent tries to read them all in one loop

...the growth becomes a real overflow risk.

Why eviction is non-trivial

Tool results can't just be deleted — the agent reasons about what it has already learned by referring back to earlier results. Naive eviction would break the agent's working memory.

Possible directions

  1. Compress old tool results. After a tool result is N turns old, replace it with a one-line placeholder: [result elided — N chars, was: read_file(path='ai.py')]. The fact of the call survives; the bytes don't.
  2. Summarize and squash. When history exceeds a threshold, run a one-shot summarization call that produces a "what I've learned so far" paragraph and replaces the old turns with that paragraph. Costs an API call per squash but bounds growth.
  3. Cache-as-memory pattern. Force the agent to write_cache after every tool result, then evict the result from history immediately — the agent reads from cache instead of scrolling back. Requires prompt changes to make this discipline natural.
  4. Per-tool output caps. Cap parse_structure / read_file output sizes more aggressively. Doesn't solve growth but reduces per-turn delta.
  5. Sliding window. Keep the last K turns in full, drop older turns entirely. Brittle — drops information the agent might still need.

Direction 3 (cache-as-memory) is the most aligned with how Luminos already works — the cache is already the persistent memory layer, the message history is just scratch space. Worth designing toward.

Acceptance

  • A dir loop running for max_turns turns on a target with many files keeps per-turn input_tokens bounded (does not grow linearly to overflow)
  • The agent's investigation quality is not measurably degraded by the eviction strategy
  • Token cost per dir loop drops noticeably (linear-growth contexts are quadratic in cost across turns)

Sequencing

Not blocking. After Phase 3 (planning) ships and we have data on how dynamic turn allocation interacts with the growth pattern. If Phase 3 raises max_turns significantly, this becomes blocking before Phase 4.

## Background #44 fixes the broken context-budget metric (cumulative sum vs actual context size). With that fix, the budget no longer trips falsely. But the underlying dynamic that *would* eventually trip a true budget remains: dir loop message history grows monotonically with no eviction. ## The growth pattern Every turn appends: - The assistant's content blocks (text + tool_use) - A user message containing every tool_result Nothing is ever removed. Per-turn `input_tokens` from the #44 verification run on `luminos_lib`: ``` turn input_tokens 1 2,654 2 4,041 3 4,308 4 7,370 5 12,793 6 13,988 7 14,853 8 15,697 9 18,460 10 20,284 11 20,535 ``` Roughly linear growth at ~1.5–2k per turn. This stays well under Sonnet's 200k window for the current `max_turns=14` cap, so the metric fix in #44 is sufficient for now. But: 1. If `max_turns` is raised (Phase 3 dynamic turn allocation could push it higher) 2. If a single tool result is large (a `parse_structure` on a big file, or a `read_file` with large `max_bytes`) 3. If a target has many files and the agent tries to read them all in one loop ...the growth becomes a real overflow risk. ## Why eviction is non-trivial Tool results can't just be deleted — the agent reasons about what it has already learned by referring back to earlier results. Naive eviction would break the agent's working memory. ## Possible directions 1. **Compress old tool results.** After a tool result is N turns old, replace it with a one-line placeholder: `[result elided — N chars, was: read_file(path='ai.py')]`. The fact of the call survives; the bytes don't. 2. **Summarize and squash.** When history exceeds a threshold, run a one-shot summarization call that produces a "what I've learned so far" paragraph and replaces the old turns with that paragraph. Costs an API call per squash but bounds growth. 3. **Cache-as-memory pattern.** Force the agent to write_cache after every tool result, then evict the result from history immediately — the agent reads from cache instead of scrolling back. Requires prompt changes to make this discipline natural. 4. **Per-tool output caps.** Cap parse_structure / read_file output sizes more aggressively. Doesn't solve growth but reduces per-turn delta. 5. **Sliding window.** Keep the last K turns in full, drop older turns entirely. Brittle — drops information the agent might still need. Direction 3 (cache-as-memory) is the most aligned with how Luminos already works — the cache is already the persistent memory layer, the message history is just scratch space. Worth designing toward. ## Acceptance - A dir loop running for `max_turns` turns on a target with many files keeps per-turn `input_tokens` bounded (does not grow linearly to overflow) - The agent's investigation quality is not measurably degraded by the eviction strategy - Token cost per dir loop drops noticeably (linear-growth contexts are quadratic in cost across turns) ## Sequencing Not blocking. After Phase 3 (planning) ships and we have data on how dynamic turn allocation interacts with the growth pattern. If Phase 3 raises `max_turns` significantly, this becomes blocking before Phase 4.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: archeious/luminos#51
No description provided.