From 725ef62edd68b8a664a2cde9480f678d2f424baf Mon Sep 17 00:00:00 2001 From: Jeff Smith Date: Sat, 11 Apr 2026 10:02:49 -0600 Subject: [PATCH] =?UTF-8?q?wiki:=20refresh=20Internals.md=20=C2=A74=20for?= =?UTF-8?q?=20#57=20dir=20loop=20refactor?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- Internals.md | 83 +++++++++++++++++++++++++++++----------------------- 1 file changed, 47 insertions(+), 36 deletions(-) diff --git a/Internals.md b/Internals.md index 5f0915f..21c6380 100644 --- a/Internals.md +++ b/Internals.md @@ -169,23 +169,32 @@ protectable. ## 4. The dir loop in depth -`_run_dir_loop()` is at `ai.py:845`. This is a hand-written agent loop and -you should expect to read it several times before it clicks. The shape is: +`_run_dir_loop()` is at `ai.py:1017`. It is a hand-written agent loop, and +you should expect to read it several times before it clicks. As of #57 the +loop body itself is a thin coordinator (~25 lines): it calls three helpers +that own the layers it used to inline. + +| Helper | Lines | Job | +|---|---|---| +| `_build_dir_loop_context()` | `ai.py:855` | Pure setup. Builds dir context, child summaries, survey block, filtered tool list, system prompt, and the seed user message. Returns a `_DirLoopContext` namedtuple. | +| `_flush_partial_dir_entry()` | `ai.py:896` | Idempotent partial-cache writer for the budget-exceeded path. Synthesizes a summary from already-cached file entries when possible, or writes a "no files processed" stub. Returns the partial summary string. | +| `_handle_turn_response()` | `ai.py:957` | Per-turn response processing. Prints text blocks and tool decisions to stderr, appends the assistant message, dispatches tools (or nudges the agent to call submit_report), appends tool_results. Returns `(done, summary)`. | + +The shape of the loop body is now: ``` -build system prompt (with survey context, child summaries, dir contents) -build initial user message ("investigate this directory now") +ctx = _build_dir_loop_context(...) reset per-loop token counter for turn in range(max_turns): # max_turns = 14 - if budget exceeded: flush partial cache and break + if budget exceeded: + print warning + partial = _flush_partial_dir_entry(...) + if partial: summary = partial + break call API (streaming) - record token usage - print text blocks and tool decisions to stderr - append assistant response to message history - if no tool calls: nudge agent to call submit_report; continue - execute each tool call, build tool_result blocks - append tool_results to message history as user message - if submit_report was called: break + done, turn_summary = _handle_turn_response(...) + if turn_summary: summary = turn_summary + if done: break return summary ``` @@ -202,32 +211,31 @@ budget; raising the cap would expose this. See **#51**. ### 4.2 Tool dispatch -Tools are not class methods. They're plain functions in `ai.py:486–642`, -registered into `_TOOL_DISPATCH` at `ai.py:645`. `_execute_tool()` -(`ai.py:659`) is a 16-line function that looks up the handler by name, -calls it, logs the turn to `investigation.log`, and returns the result -string. **The two control-flow tools — `submit_report` and `think`/ -`checkpoint` for narration — are NOT in `_TOOL_DISPATCH`** because the -loop body handles them specially: -- `submit_report` is recognized in the tool-use scan at `ai.py:977`, sets - `done = True`, and doesn't go through dispatch -- `think`, `checkpoint`, and `flag` *are* in dispatch, but they have side - effects that just print to stderr or append to `flags.jsonl` — the - return value is always `"ok"` +Tools are plain functions in `ai.py`, registered into `_TOOL_DISPATCH` at +`ai.py:650`. `_execute_tool()` (`ai.py:664`) is a small function that +looks up the handler by name, calls it, logs the turn to +`investigation.log`, and returns the result string. **The control-flow +tool `submit_report` is NOT in `_TOOL_DISPATCH`** because +`_handle_turn_response()` recognizes it specially: it sets `done = True` +and extracts the summary directly from the tool input. + +`think`, `checkpoint`, and `flag` *are* in dispatch, but they have side +effects that just print to stderr or append to `flags.jsonl` — the return +value is always `"ok"`. When you add a tool: write the function, add it to `_TOOL_DISPATCH`, add its schema to `_DIR_TOOLS`. That's it. ### 4.3 Pre-loaded context -Before the loop starts, two helpers prepare static context that goes into -the system prompt: +Before the loop starts, `_build_dir_loop_context()` (`ai.py:855`) calls +two helpers that prepare static context for the system prompt: -- `_build_dir_context()` (`ai.py:736`) — `ls`-style listing of the dir +- `_build_dir_context()` (`ai.py:741`) — `ls`-style listing of the dir with sizes and MIME types via `python-magic`. The agent sees this *before* it makes any tool calls, so it doesn't waste a turn just listing the directory. -- `_get_child_summaries()` (`ai.py:758`) — looks up each subdirectory in +- `_get_child_summaries()` (`ai.py:763`) — looks up each subdirectory in the cache and pulls its `summary` field. This is how leaves-first ordering pays off: by the time the loop runs on `src/`, all of `src/auth/`, `src/db/`, `src/middleware/` already have cached summaries @@ -253,9 +261,11 @@ the next API call. When the budget check trips, the loop: 1. Prints a `Context budget reached` warning to stderr -2. If no `dir` cache entry exists yet, builds a *partial* one from any - `file` cache entries the agent already wrote (`ai.py:889–937`), marks - it with `partial: True` and `partial_reason`, and writes it +2. Calls `_flush_partial_dir_entry()` (`ai.py:896`), which writes a + partial dir cache entry from any `file` cache entries the agent + already produced, marked with `partial: True` and `partial_reason`. + The helper is idempotent — if a dir entry already exists, it returns + `""` without writing. 3. Breaks out of the loop This means a budget breach doesn't lose work — anything the agent already @@ -265,14 +275,15 @@ rather than nothing. ### 4.5 What the loop returns `_run_dir_loop()` returns the `summary` string from `submit_report` (or -the partial summary if the budget tripped). `_run_investigation()` then -writes a normal `dir` cache entry from this summary at `ai.py:1363–1375` -— *unless* the dir loop already wrote one itself via the partial-flush -path, in which case the `cache.has_entry("dir", dir_path)` check skips it. +the partial summary returned by `_flush_partial_dir_entry()` if the +budget tripped). `_run_investigation()` then writes a normal `dir` cache +entry from this summary, *unless* the dir loop already wrote one itself +via the partial-flush path, in which case the `cache.has_entry("dir", +dir_path)` check skips it. ### 4.6 The streaming API caller -`_call_api_streaming()` (`ai.py:681`) is a thin wrapper around +`_call_api_streaming()` (`ai.py:686`) is a thin wrapper around `client.messages.stream()`. It currently doesn't print tokens as they arrive — it iterates the stream, drops everything, then pulls the final message via `stream.get_final_message()`. The streaming API is used for