wiki: refresh Internals.md §4 for #57 dir loop refactor

Jeff Smith 2026-04-11 10:02:49 -06:00
parent ecfae7edba
commit 725ef62edd

@ -169,23 +169,32 @@ protectable.
## 4. The dir loop in depth ## 4. The dir loop in depth
`_run_dir_loop()` is at `ai.py:845`. This is a hand-written agent loop and `_run_dir_loop()` is at `ai.py:1017`. It is a hand-written agent loop, and
you should expect to read it several times before it clicks. The shape is: you should expect to read it several times before it clicks. As of #57 the
loop body itself is a thin coordinator (~25 lines): it calls three helpers
that own the layers it used to inline.
| Helper | Lines | Job |
|---|---|---|
| `_build_dir_loop_context()` | `ai.py:855` | Pure setup. Builds dir context, child summaries, survey block, filtered tool list, system prompt, and the seed user message. Returns a `_DirLoopContext` namedtuple. |
| `_flush_partial_dir_entry()` | `ai.py:896` | Idempotent partial-cache writer for the budget-exceeded path. Synthesizes a summary from already-cached file entries when possible, or writes a "no files processed" stub. Returns the partial summary string. |
| `_handle_turn_response()` | `ai.py:957` | Per-turn response processing. Prints text blocks and tool decisions to stderr, appends the assistant message, dispatches tools (or nudges the agent to call submit_report), appends tool_results. Returns `(done, summary)`. |
The shape of the loop body is now:
``` ```
build system prompt (with survey context, child summaries, dir contents) ctx = _build_dir_loop_context(...)
build initial user message ("investigate this directory now")
reset per-loop token counter reset per-loop token counter
for turn in range(max_turns): # max_turns = 14 for turn in range(max_turns): # max_turns = 14
if budget exceeded: flush partial cache and break if budget exceeded:
print warning
partial = _flush_partial_dir_entry(...)
if partial: summary = partial
break
call API (streaming) call API (streaming)
record token usage done, turn_summary = _handle_turn_response(...)
print text blocks and tool decisions to stderr if turn_summary: summary = turn_summary
append assistant response to message history if done: break
if no tool calls: nudge agent to call submit_report; continue
execute each tool call, build tool_result blocks
append tool_results to message history as user message
if submit_report was called: break
return summary return summary
``` ```
@ -202,32 +211,31 @@ budget; raising the cap would expose this. See **#51**.
### 4.2 Tool dispatch ### 4.2 Tool dispatch
Tools are not class methods. They're plain functions in `ai.py:486642`, Tools are plain functions in `ai.py`, registered into `_TOOL_DISPATCH` at
registered into `_TOOL_DISPATCH` at `ai.py:645`. `_execute_tool()` `ai.py:650`. `_execute_tool()` (`ai.py:664`) is a small function that
(`ai.py:659`) is a 16-line function that looks up the handler by name, looks up the handler by name, calls it, logs the turn to
calls it, logs the turn to `investigation.log`, and returns the result `investigation.log`, and returns the result string. **The control-flow
string. **The two control-flow tools — `submit_report` and `think`/ tool `submit_report` is NOT in `_TOOL_DISPATCH`** because
`checkpoint` for narration — are NOT in `_TOOL_DISPATCH`** because the `_handle_turn_response()` recognizes it specially: it sets `done = True`
loop body handles them specially: and extracts the summary directly from the tool input.
- `submit_report` is recognized in the tool-use scan at `ai.py:977`, sets
`done = True`, and doesn't go through dispatch `think`, `checkpoint`, and `flag` *are* in dispatch, but they have side
- `think`, `checkpoint`, and `flag` *are* in dispatch, but they have side effects that just print to stderr or append to `flags.jsonl` — the return
effects that just print to stderr or append to `flags.jsonl` — the value is always `"ok"`.
return value is always `"ok"`
When you add a tool: write the function, add it to `_TOOL_DISPATCH`, add When you add a tool: write the function, add it to `_TOOL_DISPATCH`, add
its schema to `_DIR_TOOLS`. That's it. its schema to `_DIR_TOOLS`. That's it.
### 4.3 Pre-loaded context ### 4.3 Pre-loaded context
Before the loop starts, two helpers prepare static context that goes into Before the loop starts, `_build_dir_loop_context()` (`ai.py:855`) calls
the system prompt: two helpers that prepare static context for the system prompt:
- `_build_dir_context()` (`ai.py:736`) — `ls`-style listing of the dir - `_build_dir_context()` (`ai.py:741`) — `ls`-style listing of the dir
with sizes and MIME types via `python-magic`. The agent sees this with sizes and MIME types via `python-magic`. The agent sees this
*before* it makes any tool calls, so it doesn't waste a turn just *before* it makes any tool calls, so it doesn't waste a turn just
listing the directory. listing the directory.
- `_get_child_summaries()` (`ai.py:758`) — looks up each subdirectory in - `_get_child_summaries()` (`ai.py:763`) — looks up each subdirectory in
the cache and pulls its `summary` field. This is how leaves-first the cache and pulls its `summary` field. This is how leaves-first
ordering pays off: by the time the loop runs on `src/`, all of ordering pays off: by the time the loop runs on `src/`, all of
`src/auth/`, `src/db/`, `src/middleware/` already have cached summaries `src/auth/`, `src/db/`, `src/middleware/` already have cached summaries
@ -253,9 +261,11 @@ the next API call.
When the budget check trips, the loop: When the budget check trips, the loop:
1. Prints a `Context budget reached` warning to stderr 1. Prints a `Context budget reached` warning to stderr
2. If no `dir` cache entry exists yet, builds a *partial* one from any 2. Calls `_flush_partial_dir_entry()` (`ai.py:896`), which writes a
`file` cache entries the agent already wrote (`ai.py:889937`), marks partial dir cache entry from any `file` cache entries the agent
it with `partial: True` and `partial_reason`, and writes it already produced, marked with `partial: True` and `partial_reason`.
The helper is idempotent — if a dir entry already exists, it returns
`""` without writing.
3. Breaks out of the loop 3. Breaks out of the loop
This means a budget breach doesn't lose work — anything the agent already This means a budget breach doesn't lose work — anything the agent already
@ -265,14 +275,15 @@ rather than nothing.
### 4.5 What the loop returns ### 4.5 What the loop returns
`_run_dir_loop()` returns the `summary` string from `submit_report` (or `_run_dir_loop()` returns the `summary` string from `submit_report` (or
the partial summary if the budget tripped). `_run_investigation()` then the partial summary returned by `_flush_partial_dir_entry()` if the
writes a normal `dir` cache entry from this summary at `ai.py:13631375` budget tripped). `_run_investigation()` then writes a normal `dir` cache
*unless* the dir loop already wrote one itself via the partial-flush entry from this summary, *unless* the dir loop already wrote one itself
path, in which case the `cache.has_entry("dir", dir_path)` check skips it. via the partial-flush path, in which case the `cache.has_entry("dir",
dir_path)` check skips it.
### 4.6 The streaming API caller ### 4.6 The streaming API caller
`_call_api_streaming()` (`ai.py:681`) is a thin wrapper around `_call_api_streaming()` (`ai.py:686`) is a thin wrapper around
`client.messages.stream()`. It currently doesn't print tokens as they `client.messages.stream()`. It currently doesn't print tokens as they
arrive — it iterates the stream, drops everything, then pulls the final arrive — it iterates the stream, drops everything, then pulls the final
message via `stream.get_final_message()`. The streaming API is used for message via `stream.get_final_message()`. The streaming API is used for