wiki: document leaf-first investigation contract in Internals §4.7 (#72)

Jeff Smith 2026-04-11 10:58:11 -06:00
parent 0a33ec6bd2
commit 508d66cba8

@ -297,6 +297,49 @@ real-time tool decision printing, which today happens only after the full
response arrives. There's room here to add live progress printing if you
want it.
### 4.7 The leaf-first contract (load-bearing for child summaries)
`_discover_directories()` returns directories sorted leaves-first (the
deepest paths first, parents last). This is not a stylistic choice. It
is a load-bearing invariant. **`_get_child_summaries()` depends on it.**
When the dir loop runs on a parent like `src/`,
`_get_child_summaries()` reads the cache for each subdirectory of
`src/` (`src/auth/`, `src/db/`, `src/middleware/`) and injects their
existing summaries into the parent's system prompt under
`{child_summaries}`. This is how the agent gets context about parts of
the project it isn't currently inside without re-reading them, and it
is the entire payoff of leaves-first ordering.
The trick: those subdirectory summaries only exist if the children
were investigated *first*. If `src/` runs before `src/auth/`, the
cache lookup at `ai.py:825` returns nothing. The function falls
through to its default at `ai.py:832` and returns the string
`(none — this is a leaf directory)`. The parent's system prompt
silently loses all of its child context, and the agent has no way to
know — the placeholder claims the dir is a leaf, which is a lie when
the children just haven't been investigated yet. The dir summary
degrades and the synthesis pass inherits the degradation.
**If you change the investigation order**, you have to do one of:
1. **Preserve the leaf-first invariant within whatever new order you
introduce.** A "priority-first" order can still process directories
leaves-first within each priority band, so children always run
before parents.
2. **Explicitly handle the missing-child-summaries case in the
prompt.** Replace the lie ("leaf directory") with the truth
("children not yet investigated") so the agent at least knows what
it doesn't have, and accept that some dirs will run with degraded
context.
Phase 3's planning pass introduces the temptation to investigate
priority dirs first. Both alternatives above are open. Whichever is
chosen, this contract has to be addressed *explicitly* — the test
class `TestDiscoverDirectories` (in `tests/test_ai_pure.py`) pins the
current ordering, so any change will be loud, but the *reason* the
ordering matters lives here.
---
## 5. The cache model