From 508d66cba8359d69c345c0abd13224dd856c6d41 Mon Sep 17 00:00:00 2001 From: Jeff Smith Date: Sat, 11 Apr 2026 10:58:11 -0600 Subject: [PATCH] =?UTF-8?q?wiki:=20document=20leaf-first=20investigation?= =?UTF-8?q?=20contract=20in=20Internals=20=C2=A74.7=20(#72)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- Internals.md | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/Internals.md b/Internals.md index 8c0b008..4cb7a46 100644 --- a/Internals.md +++ b/Internals.md @@ -297,6 +297,49 @@ real-time tool decision printing, which today happens only after the full response arrives. There's room here to add live progress printing if you want it. +### 4.7 The leaf-first contract (load-bearing for child summaries) + +`_discover_directories()` returns directories sorted leaves-first (the +deepest paths first, parents last). This is not a stylistic choice. It +is a load-bearing invariant. **`_get_child_summaries()` depends on it.** + +When the dir loop runs on a parent like `src/`, +`_get_child_summaries()` reads the cache for each subdirectory of +`src/` (`src/auth/`, `src/db/`, `src/middleware/`) and injects their +existing summaries into the parent's system prompt under +`{child_summaries}`. This is how the agent gets context about parts of +the project it isn't currently inside without re-reading them, and it +is the entire payoff of leaves-first ordering. + +The trick: those subdirectory summaries only exist if the children +were investigated *first*. If `src/` runs before `src/auth/`, the +cache lookup at `ai.py:825` returns nothing. The function falls +through to its default at `ai.py:832` and returns the string +`(none — this is a leaf directory)`. The parent's system prompt +silently loses all of its child context, and the agent has no way to +know — the placeholder claims the dir is a leaf, which is a lie when +the children just haven't been investigated yet. The dir summary +degrades and the synthesis pass inherits the degradation. + +**If you change the investigation order**, you have to do one of: + +1. **Preserve the leaf-first invariant within whatever new order you + introduce.** A "priority-first" order can still process directories + leaves-first within each priority band, so children always run + before parents. +2. **Explicitly handle the missing-child-summaries case in the + prompt.** Replace the lie ("leaf directory") with the truth + ("children not yet investigated") so the agent at least knows what + it doesn't have, and accept that some dirs will run with degraded + context. + +Phase 3's planning pass introduces the temptation to investigate +priority dirs first. Both alternatives above are open. Whichever is +chosen, this contract has to be addressed *explicitly* — the test +class `TestDiscoverDirectories` (in `tests/test_ai_pure.py`) pins the +current ordering, so any change will be loud, but the *reason* the +ordering matters lives here. + --- ## 5. The cache model