wiki: refresh Internals.md §4 for #57 dir loop refactor

2026-04-11 10:02:49 -06:00 · 2026-04-11 10:02:49 -06:00 · 725ef62edd
commit 725ef62edd
parent ecfae7edba
1 changed files with 47 additions and 36 deletions
--- a/Internals.md
+++ b/Internals.md
@ -169,23 +169,32 @@ protectable.

 ## 4. The dir loop in depth

-`_run_dir_loop()` is at `ai.py:845`. This is a hand-written agent loop and
-you should expect to read it several times before it clicks. The shape is:
+`_run_dir_loop()` is at `ai.py:1017`. It is a hand-written agent loop, and
+you should expect to read it several times before it clicks. As of #57 the
+loop body itself is a thin coordinator (~25 lines): it calls three helpers
+that own the layers it used to inline.
+
+| Helper | Lines | Job |
+|---|---|---|
+| `_build_dir_loop_context()` | `ai.py:855` | Pure setup. Builds dir context, child summaries, survey block, filtered tool list, system prompt, and the seed user message. Returns a `_DirLoopContext` namedtuple. |
+| `_flush_partial_dir_entry()` | `ai.py:896` | Idempotent partial-cache writer for the budget-exceeded path. Synthesizes a summary from already-cached file entries when possible, or writes a "no files processed" stub. Returns the partial summary string. |
+| `_handle_turn_response()` | `ai.py:957` | Per-turn response processing. Prints text blocks and tool decisions to stderr, appends the assistant message, dispatches tools (or nudges the agent to call submit_report), appends tool_results. Returns `(done, summary)`. |
+
+The shape of the loop body is now:

 ```
-build system prompt (with survey context, child summaries, dir contents)
-build initial user message ("investigate this directory now")
+ctx = _build_dir_loop_context(...)
 reset per-loop token counter
 for turn in range(max_turns):                    # max_turns = 14
-    if budget exceeded: flush partial cache and break
+    if budget exceeded:
+        print warning
+        partial = _flush_partial_dir_entry(...)
+        if partial: summary = partial
+        break
    call API (streaming)
-    record token usage
-    print text blocks and tool decisions to stderr
-    append assistant response to message history
-    if no tool calls: nudge agent to call submit_report; continue
-    execute each tool call, build tool_result blocks
-    append tool_results to message history as user message
-    if submit_report was called: break
+    done, turn_summary = _handle_turn_response(...)
+    if turn_summary: summary = turn_summary
+    if done: break
 return summary
 ```

@ -202,32 +211,31 @@ budget; raising the cap would expose this. See **#51**.

 ### 4.2 Tool dispatch

-Tools are not class methods. They're plain functions in `ai.py:486–642`,
-registered into `_TOOL_DISPATCH` at `ai.py:645`. `_execute_tool()`
-(`ai.py:659`) is a 16-line function that looks up the handler by name,
-calls it, logs the turn to `investigation.log`, and returns the result
-string. **The two control-flow tools — `submit_report` and `think`/
-`checkpoint` for narration — are NOT in `_TOOL_DISPATCH`** because the
-loop body handles them specially:
- `submit_report` is recognized in the tool-use scan at `ai.py:977`, sets
-  `done = True`, and doesn't go through dispatch
- `think`, `checkpoint`, and `flag` *are* in dispatch, but they have side
-  effects that just print to stderr or append to `flags.jsonl` — the
-  return value is always `"ok"`
+Tools are plain functions in `ai.py`, registered into `_TOOL_DISPATCH` at
+`ai.py:650`. `_execute_tool()` (`ai.py:664`) is a small function that
+looks up the handler by name, calls it, logs the turn to
+`investigation.log`, and returns the result string. **The control-flow
+tool `submit_report` is NOT in `_TOOL_DISPATCH`** because
+`_handle_turn_response()` recognizes it specially: it sets `done = True`
+and extracts the summary directly from the tool input.
+
+`think`, `checkpoint`, and `flag` *are* in dispatch, but they have side
+effects that just print to stderr or append to `flags.jsonl` — the return
+value is always `"ok"`.

 When you add a tool: write the function, add it to `_TOOL_DISPATCH`, add
 its schema to `_DIR_TOOLS`. That's it.

 ### 4.3 Pre-loaded context

-Before the loop starts, two helpers prepare static context that goes into
-the system prompt:
+Before the loop starts, `_build_dir_loop_context()` (`ai.py:855`) calls
+two helpers that prepare static context for the system prompt:

- `_build_dir_context()` (`ai.py:736`) — `ls`-style listing of the dir
+- `_build_dir_context()` (`ai.py:741`) — `ls`-style listing of the dir
  with sizes and MIME types via `python-magic`. The agent sees this
  *before* it makes any tool calls, so it doesn't waste a turn just
  listing the directory.
- `_get_child_summaries()` (`ai.py:758`) — looks up each subdirectory in
+- `_get_child_summaries()` (`ai.py:763`) — looks up each subdirectory in
  the cache and pulls its `summary` field. This is how leaves-first
  ordering pays off: by the time the loop runs on `src/`, all of
  `src/auth/`, `src/db/`, `src/middleware/` already have cached summaries
@ -253,9 +261,11 @@ the next API call.

 When the budget check trips, the loop:
 1. Prints a `Context budget reached` warning to stderr
-2. If no `dir` cache entry exists yet, builds a *partial* one from any
-   `file` cache entries the agent already wrote (`ai.py:889–937`), marks
-   it with `partial: True` and `partial_reason`, and writes it
+2. Calls `_flush_partial_dir_entry()` (`ai.py:896`), which writes a
+   partial dir cache entry from any `file` cache entries the agent
+   already produced, marked with `partial: True` and `partial_reason`.
+   The helper is idempotent — if a dir entry already exists, it returns
+   `""` without writing.
 3. Breaks out of the loop

 This means a budget breach doesn't lose work — anything the agent already
@ -265,14 +275,15 @@ rather than nothing.
 ### 4.5 What the loop returns

 `_run_dir_loop()` returns the `summary` string from `submit_report` (or
-the partial summary if the budget tripped). `_run_investigation()` then
-writes a normal `dir` cache entry from this summary at `ai.py:1363–1375`
-— *unless* the dir loop already wrote one itself via the partial-flush
-path, in which case the `cache.has_entry("dir", dir_path)` check skips it.
+the partial summary returned by `_flush_partial_dir_entry()` if the
+budget tripped). `_run_investigation()` then writes a normal `dir` cache
+entry from this summary, *unless* the dir loop already wrote one itself
+via the partial-flush path, in which case the `cache.has_entry("dir",
+dir_path)` check skips it.

 ### 4.6 The streaming API caller

-`_call_api_streaming()` (`ai.py:681`) is a thin wrapper around
+`_call_api_streaming()` (`ai.py:686`) is a thin wrapper around
 `client.messages.stream()`. It currently doesn't print tokens as they
 arrive — it iterates the stream, drops everything, then pulls the final
 message via `stream.get_final_message()`. The streaming API is used for