wiki: Internals — reflect Phase 3 planning pass, (summary, completeness) return, cache layout

2026-04-18 20:08:30 -06:00 · 2026-04-18 20:08:30 -06:00 · d3315b530f
commit d3315b530f
parent 717cde8562
1 changed files with 234 additions and 171 deletions
--- a/Internals.md
+++ b/Internals.md
@ -7,7 +7,8 @@ agent loop can finish this page and start making non-trivial changes.

 All file:line references are accurate as of the date this page was last
 edited — verify with `git log` or by opening the file before relying on a
-specific line number.
+specific line number. `ai.py` in particular grows each phase and
+references drift.

 ---

@ -36,9 +37,9 @@ wait for a scan they can't use.

 ## 2. Base scan walkthrough

-Entry: `luminos.py:main()` parses args, then calls `scan(target, ...)` at
-`luminos.py:45`. `scan()` is a flat sequence — it builds a `report` dict
-by calling helpers from `luminos_lib/`, one per concern, in order:
+Entry: `luminos.py:main()` parses args, then calls `scan(target, ...)`.
+`scan()` is a flat sequence — it builds a `report` dict by calling helpers
+from `luminos_lib/`, one per concern, in order:

 ```
 scan(target)
@ -60,7 +61,7 @@ event-driven, and there is no shared state object — everything passes
 through the local `report` dict.

 The progress lines you see on stderr (`[scan] Counting lines... foo.py`)
-come from `_progress()` in `luminos.py:23`, which returns an `on_file`
+come from `_progress()` in `luminos.py`, which returns an `on_file`
 callback that the helpers call as they work. If you add a new helper that
 walks files, plumb a progress callback through the same way for
 consistency.
@ -77,46 +78,54 @@ the base scan because it needs `report["survey_signals"]` and

 The AI pipeline is what makes Luminos interesting and is also where
 almost all the complexity lives. Everything below happens inside
-`luminos_lib/ai.py` (1438 lines as of writing), called from
-`luminos.py:157` via `analyze_directory()`.
+`luminos_lib/ai.py` (~2060 lines as of writing), called from `luminos.py`
+via `analyze_directory()`.

 ### 3.1 The orchestrator

-`analyze_directory()` (`ai.py:1408`) is a thin wrapper that checks
-dependencies, gets the API key, builds the Anthropic client, and calls
-`_run_investigation()`. If anything fails it prints a warning and returns
-empty strings — the rest of luminos keeps working.
+`analyze_directory()` is a thin wrapper that checks dependencies, gets the
+API key, builds the Anthropic client, and calls `_run_investigation()`.
+If anything fails it prints a warning and returns empty strings — the
+rest of luminos keeps working.

-`_run_investigation()` (`ai.py:1286`) is the real entry point. Read this
-function first if you want to understand the pipeline shape. It does six
-things, in order:
+`_run_investigation()` is the real entry point. Read this function first
+if you want to understand the pipeline shape. It does **seven** things,
+in order:

-1. **Get/create an investigation ID and cache** (`ai.py:1289–1294`).
-   Investigation IDs let you resume a previous run; see §5 below.
+1. **Get/create an investigation ID and cache**. Investigation IDs let
+   you resume a previous run; see §5 below.
 2. **Discover all directories** under the target via
-   `_discover_directories()` (`ai.py:715`). Returns them sorted
-   *leaves-first* — the deepest paths come first. This matters because
-   each dir loop reads its child directories' summaries from cache, so
-   children must be investigated before parents.
-3. **Run the survey pass** (`ai.py:1300–1334`) unless the target is below
-   the size thresholds at `ai.py:780–781`, in which case
+   `_discover_directories()`. Returns them sorted *leaves-first* — the
+   deepest paths come first. This matters because each dir loop reads
+   its child directories' summaries from cache, so children must be
+   investigated before parents.
+3. **Run the survey pass** unless the target is below
+   `_SURVEY_MIN_FILES` and `_SURVEY_MIN_DIRS`, in which case
   `_default_survey()` returns a synthetic skip.
-4. **Filter out cached directories** (`ai.py:1336–1349`). If you're
-   resuming an investigation, dirs that already have a `dir` cache entry
-   are skipped — only new ones get a fresh dir loop.
-5. **Run a dir loop per remaining directory** (`ai.py:1351–1375`). This
-   is the heart of the system — see §4.
-6. **Run the synthesis pass** (`ai.py:1382`) reading only `dir` cache
-   entries to produce `(brief, detailed)`.
+4. **Filter out cached directories**. If you're resuming an
+   investigation, dirs that already have a `dir` cache entry are
+   skipped — only new ones get a fresh dir loop.
+5. **Run the planning pass** (Phase 3) unless the target is small, in
+   which case `_default_plan()` returns an empty plan. On resumed runs
+   the planner is skipped and `plan.json` is loaded from cache instead.
+   `_apply_plan()` then sorts dirs into priority/default/shallow bands
+   and builds a `{dir_path: max_turns}` map. Leaf-first ordering is
+   preserved *within* each band (see §4.7).
+6. **Run a dir loop per remaining directory**, iterating the
+   plan-ordered list with the per-directory `max_turns` from the plan.
+   `_write_plan_evaluation()` records turn-utilization metrics at the
+   end. This is the heart of the system — see §4.
+7. **Run the synthesis pass** reading only `dir` cache entries to
+   produce `(brief, detailed)`.

-It also reads `flags.jsonl` from disk at the end (`ai.py:1387–1397`) and
-returns `(brief, detailed, flags)` to `analyze_directory()`.
+It also reads `flags.jsonl` from disk at the end and returns
+`(brief, detailed, flags)` to `analyze_directory()`.

 ### 3.2 The survey pass

-`_run_survey()` (`ai.py:1051`) is a short, single-purpose loop. It exists
-to give the dir loops some shared context about what they're looking at
-*as a whole* before any of them start.
+`_run_survey()` is a short, single-purpose loop. It exists to give the
+dir loops some shared context about what they're looking at *as a whole*
+before any of them start.

 Inputs go into the system prompt (`_SURVEY_SYSTEM_PROMPT` in
 `prompts.py`):
@ -125,9 +134,9 @@ Inputs go into the system prompt (`_SURVEY_SYSTEM_PROMPT` in
 - A 2-level tree preview from `build_tree(target, max_depth=2)`
 - The list of tools the dir loop will have available

-The survey is allowed only `submit_survey` as a tool (`_SURVEY_TOOLS` at
-`ai.py:356`). It runs at most 3 turns. The agent must call `submit_survey`
-exactly once with six fields:
+The survey is allowed only `submit_survey` as a tool (`_SURVEY_TOOLS`).
+It runs at most 3 turns. The agent must call `submit_survey` exactly
+once with six fields:

 ```python
 {
@ -148,54 +157,82 @@ loops still run but with `survey=None` — the system degrades gracefully.

 Two things happen with the survey output before each dir loop runs:

-**Survey block injection.** `_format_survey_block()` (`ai.py:803`) renders
-the survey dict as a labeled text block, which gets `.format()`-injected
-into the dir loop system prompt as `{survey_context}`. The dir agent sees
-the description, approach, domain notes, and which tools it should lean on
+**Survey block injection.** `_format_survey_block()` renders the survey
+dict as a labeled text block, which gets `.format()`-injected into the
+dir loop system prompt as `{survey_context}`. The dir agent sees the
+description, approach, domain notes, and which tools it should lean on
 or skip.

-**Tool filtering.** `_filter_dir_tools()` (`ai.py:824`) returns a copy of
-`_DIR_TOOLS` with anything in `skip_tools` removed — but only if the
-survey's confidence is at or above `_SURVEY_CONFIDENCE_THRESHOLD = 0.5`
-(`ai.py:775`). Below that threshold the agent gets the full toolbox. The
-control-flow tool `submit_report` is in `_PROTECTED_DIR_TOOLS` and can
-never be filtered out — removing it would break loop termination.
+**Tool filtering.** `_filter_dir_tools()` returns a copy of `_DIR_TOOLS`
+with anything in `skip_tools` removed — but only if the survey's
+confidence is at or above `_SURVEY_CONFIDENCE_THRESHOLD = 0.5`. Below
+that threshold the agent gets the full toolbox. The control-flow tool
+`submit_report` is in `_PROTECTED_DIR_TOOLS` and can never be filtered
+out — removing it would break loop termination.

-This is the only place in the codebase where the agent's available tools
-change at runtime. If you add a new tool, decide whether it should be
-protectable.
+This is the only place in the codebase where the agent's available
+tools change at runtime. If you add a new tool, decide whether it
+should be protectable.
+
+### 3.4 The planning pass (Phase 3)
+
+`_run_planning()` is structured like `_run_survey()`: a single-purpose
+loop with one submit tool (`submit_plan`), low max turns. Its job is to
+decide *where* the dir loops should spend turns, not to investigate.
+
+Inputs:
+- The survey dict (formatted via `_format_survey_block()`)
+- The full tree at depth 6 (deeper than the survey's 2-level preview)
+- The base scan's `survey_signals` (raw file signals)
+- The list of already-cached directories (so the planner doesn't plan
+  around dirs that will be skipped)
+
+The plan schema, tier allocations (priority 15–20 cap 25, default 10,
+shallow 5, skip 0), fallback behavior, and resume behavior are covered
+in full on the [Planning Pass](PlanningPass) page.
+
+`_apply_plan()` is a pure helper that translates the plan into an
+ordered list of directories plus a `{dir_path: max_turns}` map. It
+sorts dirs into priority/default/shallow bands but **preserves
+leaf-first ordering within each band** — so children always run before
+their parents, even in "priority-first" mode. See §4.7.
+
+`_write_plan_evaluation()` writes `plan_evaluation.json` at the end of
+every run with `turns_allocated`, `turns_used`, and `completeness` per
+directory. This is the planning pass's report card.

 ---

 ## 4. The dir loop in depth

-`_run_dir_loop()` is at `ai.py:1017`. It is a hand-written agent loop, and
-you should expect to read it several times before it clicks. As of #57 the
-loop body itself is a thin coordinator (~25 lines): it calls three helpers
-that own the layers it used to inline.
+`_run_dir_loop()` is a hand-written agent loop, and you should expect
+to read it several times before it clicks. As of #57 the loop body
+itself is a thin coordinator (~25 lines): it calls three helpers that
+own the layers it used to inline.

-| Helper | Lines | Job |
-|---|---|---|
-| `_build_dir_loop_context()` | `ai.py:855` | Pure setup. Builds dir context, child summaries, survey block, filtered tool list, system prompt, and the seed user message. Returns a `_DirLoopContext` namedtuple. |
-| `_flush_partial_dir_entry()` | `ai.py:896` | Idempotent partial-cache writer for the budget-exceeded path. Synthesizes a summary from already-cached file entries when possible, or writes a "no files processed" stub. Returns the partial summary string. |
-| `_handle_turn_response()` | `ai.py:957` | Per-turn response processing. Prints text blocks and tool decisions to stderr, appends the assistant message, dispatches tools (or nudges the agent to call submit_report), appends tool_results. Returns `(done, summary)`. |
+| Helper | Job |
+|---|---|
+| `_build_dir_loop_context()` | Pure setup. Builds dir context, child summaries, survey block, filtered tool list, system prompt, and the seed user message. Returns a `_DirLoopContext` namedtuple. |
+| `_flush_partial_dir_entry()` | Idempotent partial-cache writer for the budget-exceeded path. Synthesizes a summary from already-cached file entries when possible, or writes a "no files processed" stub. Returns the partial summary string. |
+| `_handle_turn_response()` | Per-turn response processing. Prints text blocks and tool decisions to stderr, appends the assistant message, dispatches tools (or nudges the agent to call submit_report), appends tool_results. Returns `(done, summary, completeness)`. |

 The shape of the loop body is now:

 ```
 ctx = _build_dir_loop_context(...)
 reset per-loop token counter
-for turn in range(max_turns):                    # max_turns = 14
+for turn in range(max_turns):                   # max_turns from plan (5–25)
    if budget exceeded:
        print warning
        partial = _flush_partial_dir_entry(...)
        if partial: summary = partial
        break
    call API (streaming)
-    done, turn_summary = _handle_turn_response(...)
+    done, turn_summary, turn_completeness = _handle_turn_response(...)
    if turn_summary: summary = turn_summary
+    if turn_completeness: completeness = turn_completeness
    if done: break
-return summary
+return (summary, completeness)
 ```

 A few non-obvious mechanics:
@ -207,95 +244,104 @@ message (the tool results). Nothing is ever evicted. This means
 `input_tokens` on each successive API call grows roughly linearly — the
 model is re-sent the full conversation every turn. On code targets we see
 ~1.5–2k tokens added per turn. At `max_turns=14` this stays under the
-budget; raising the cap would expose this. See **#51**.
+budget; raising the cap would expose this. With Phase 3's priority-tier
+cap of 25, we're still well under budget in practice but closer to the
+ceiling. See **#51**.

 ### 4.2 Tool dispatch

 Tools are plain functions in `ai.py`. They are wired up via a single
-`register_tool()` call (`ai.py:172`) that lands the schema in one or
-more scope lists (`_DIR_TOOLS`, `_SYNTHESIS_TOOLS`, `_SURVEY_TOOLS`)
+`register_tool()` call that lands the schema in one or more scope lists
+(`_DIR_TOOLS`, `_SYNTHESIS_TOOLS`, `_SURVEY_TOOLS`, `_PLANNING_TOOLS`)
 and the handler in `_TOOL_DISPATCH`. The registrations live below the
-tool implementations in `ai.py` and read top-to-bottom in dir-then-
-synthesis-then-survey order.
+tool implementations in `ai.py` and read top-to-bottom in
+dir-then-synthesis-then-survey-then-planning order.

 `_execute_tool()` looks up the handler by name in `_TOOL_DISPATCH`,
 calls it, logs the turn to `investigation.log`, and returns the result
-string. **Tools intercepted by the loop body — `submit_report` and
-`submit_survey` — register their schema only and have no handler entry.**
-`_handle_turn_response()` recognizes `submit_report` specially: it sets
-`done = True` and extracts the summary directly from the tool input.
+string. **Tools intercepted by the loop body — `submit_report`,
+`submit_survey`, `submit_plan` — register their schema only and have no
+handler entry.** `_handle_turn_response()` recognizes `submit_report`
+specially: it sets `done = True`, extracts the summary from the tool
+input, and also extracts the optional `completeness` field (Phase 3
+instrumentation).

 `think`, `checkpoint`, and `flag` *are* in dispatch, but they have side
-effects that just print to stderr or append to `flags.jsonl` — the return
-value is always `"ok"`.
+effects that just print to stderr or append to `flags.jsonl` — the
+return value is always `"ok"`.

 When you add a tool: write the function, then add one `register_tool()`
 call below it. That's it. There is no second place to forget.

 ### 4.3 Pre-loaded context

-Before the loop starts, `_build_dir_loop_context()` (`ai.py:855`) calls
-two helpers that prepare static context for the system prompt:
+Before the loop starts, `_build_dir_loop_context()` calls two helpers
+that prepare static context for the system prompt:

- `_build_dir_context()` (`ai.py:741`) — `ls`-style listing of the dir
-  with sizes and MIME types via `python-magic`. The agent sees this
-  *before* it makes any tool calls, so it doesn't waste a turn just
-  listing the directory.
- `_get_child_summaries()` (`ai.py:763`) — looks up each subdirectory in
-  the cache and pulls its `summary` field. This is how leaves-first
-  ordering pays off: by the time the loop runs on `src/`, all of
-  `src/auth/`, `src/db/`, `src/middleware/` already have cached summaries
-  that get injected as `{child_summaries}`.
+- `_build_dir_context()` — `ls`-style listing of the dir with sizes and
+  MIME types via `python-magic`. The agent sees this *before* it makes
+  any tool calls, so it doesn't waste a turn just listing the directory.
+- `_get_child_summaries()` — looks up each subdirectory in the cache and
+  pulls its `summary` field. This is how leaves-first ordering pays off:
+  by the time the loop runs on `src/`, all of `src/auth/`, `src/db/`,
+  `src/middleware/` already have cached summaries that get injected as
+  `{child_summaries}`.

-If `_get_child_summaries()` returns nothing, the prompt says
-`(none — this is a leaf directory)`.
+If `_get_child_summaries()` returns nothing, the prompt distinguishes
+leaf directories (`"(none: this is a leaf directory)"`) from parents
+whose children haven't been investigated yet (`"(child directories
+exist but have not been investigated yet)"`). See §4.7.

 ### 4.4 The token tracker and the budget check

-`_TokenTracker` (`ai.py:94`) is a tiny accumulator with one important
-subtlety, captured in **#44**:
+`_TokenTracker` is a tiny accumulator with one important subtlety,
+captured in **#44**:

 > Cumulative input tokens are NOT a meaningful proxy for context size:
 > each turn's `input_tokens` already includes the full message history,
 > so summing across turns double-counts everything. Use `last_input` for
 > budget decisions, totals for billing.

-So `budget_exceeded()` (`ai.py:135`) compares `last_input` (the most
-recent call's input_tokens) to `CONTEXT_BUDGET` (`ai.py:40`), which is
-70% of 200k. This is checked at the *top* of each loop iteration, before
-the next API call.
+So `budget_exceeded()` compares `last_input` (the most recent call's
+input_tokens) to `CONTEXT_BUDGET`, which is 70% of 200k. This is
+checked at the *top* of each loop iteration, before the next API call.

 When the budget check trips, the loop:
 1. Prints a `Context budget reached` warning to stderr
-2. Calls `_flush_partial_dir_entry()` (`ai.py:896`), which writes a
-   partial dir cache entry from any `file` cache entries the agent
-   already produced, marked with `partial: True` and `partial_reason`.
-   The helper is idempotent — if a dir entry already exists, it returns
-   `""` without writing.
+2. Calls `_flush_partial_dir_entry()`, which writes a partial dir cache
+   entry from any `file` cache entries the agent already produced,
+   marked with `partial: True` and `partial_reason`. The helper is
+   idempotent — if a dir entry already exists, it returns `""` without
+   writing.
 3. Breaks out of the loop

-This means a budget breach doesn't lose work — anything the agent already
-cached survives, and the synthesis pass will see a partial dir summary
-rather than nothing.
+This means a budget breach doesn't lose work — anything the agent
+already cached survives, and the synthesis pass will see a partial dir
+summary rather than nothing.

 ### 4.5 What the loop returns

-`_run_dir_loop()` returns the `summary` string from `submit_report` (or
-the partial summary returned by `_flush_partial_dir_entry()` if the
-budget tripped). `_run_investigation()` then writes a normal `dir` cache
-entry from this summary, *unless* the dir loop already wrote one itself
-via the partial-flush path, in which case the `cache.has_entry("dir",
-dir_path)` check skips it.
+`_run_dir_loop()` returns `(summary, completeness)`. The summary is the
+string from `submit_report` (or the partial summary returned by
+`_flush_partial_dir_entry()` if the budget tripped). The completeness
+is the agent's self-rated investigation thoroughness (0.0–1.0) — Phase
+3 instrumentation used in `plan_evaluation.json` — or `None` if the
+agent didn't report one.
+
+`_run_investigation()` writes a normal `dir` cache entry from this
+summary (with `completeness` included if non-None), *unless* the dir
+loop already wrote one itself via the partial-flush path, in which case
+the `cache.has_entry("dir", dir_path)` check skips it.

 ### 4.6 The streaming API caller

-`_call_api_streaming()` (`ai.py:686`) is a thin wrapper around
+`_call_api_streaming()` is a thin wrapper around
 `client.messages.stream()`. It currently doesn't print tokens as they
 arrive — it iterates the stream, drops everything, then pulls the final
 message via `stream.get_final_message()`. The streaming API is used for
-real-time tool decision printing, which today happens only after the full
-response arrives. There's room here to add live progress printing if you
-want it.
+real-time tool decision printing, which today happens only after the
+full response arrives. There's room here to add live progress printing
+if you want it.

 ### 4.7 The leaf-first contract (load-bearing for child summaries)

@ -339,32 +385,34 @@ the full design.
 ## 5. The cache model

 Cache lives at `/tmp/luminos/{investigation_id}/`. Code is
-`luminos_lib/cache.py` (201 lines).
+`luminos_lib/cache.py`.

 ### 5.1 Investigation IDs

 `/tmp/luminos/investigations.json` maps absolute target paths to UUIDs.
-`_get_investigation_id()` (`cache.py:40`) looks up the target and either
-returns the existing UUID (resume) or creates a new one (fresh run).
-`--fresh` forces a new UUID even if one exists.
+`_get_investigation_id()` looks up the target and either returns the
+existing UUID (resume) or creates a new one (fresh run). `--fresh`
+forces a new UUID even if one exists.

 ### 5.2 What's stored

 Inside `/tmp/luminos/{uuid}/`:

 ```
-meta.json              investigation metadata (model, start time, dir count)
-files/<sha256>.json    one file per cached file entry
-dirs/<sha256>.json     one file per cached directory entry
-flags.jsonl            JSONL — appended on every flag tool call
-investigation.log      JSONL — appended on every tool call
+meta.json                 investigation metadata (model, start time, dir count)
+plan.json                 planning pass output — cached for resumed runs
+plan_evaluation.json      post-investigation quality report (Phase 3)
+files/<sha256>.json       one file per cached file entry
+dirs/<sha256>.json        one file per cached directory entry
+flags.jsonl               JSONL — appended on every flag tool call
+investigation.log         JSONL — appended on every tool call
 ```

 **File and dir cache entries are NOT in JSONL** — they are one
-sha256-keyed JSON file per entry. The sha256 is over the path string
-(`cache.py:13`). Only `flags.jsonl` and `investigation.log` use JSONL.
+sha256-keyed JSON file per entry. The sha256 is over the path string.
+Only `flags.jsonl` and `investigation.log` use JSONL.

-Required fields are validated in `write_entry()` (`cache.py:115`):
+Required fields are validated in `write_entry()`:

 ```python
 file: {path, relative_path, size_bytes, category, summary, cached_at}
@ -376,31 +424,45 @@ The validator also rejects entries containing `content`, `contents`, or
 contents, summaries only. If you change the schema, update the required
 set in `write_entry()` and update the test in `tests/test_cache.py`.

-### 5.3 Confidence support already exists
+### 5.3 Confidence + completeness support

-`write_entry()` validates an optional `confidence` field
-(`cache.py:129–134`) and a `confidence_reason` string.
-`low_confidence_entries(threshold=0.7)` (`cache.py:191`) returns all
-entries below a threshold, sorted ascending. The agent doesn't currently
-*set* these fields in any prompt — that lights up when Phase 1 work
-actually wires the prompts.
+`write_entry()` validates optional `confidence` and `confidence_reason`
+fields (Phase 1) and an optional `completeness` field (Phase 3,
+0.0–1.0, the dir agent's self-rated thoroughness).
+`low_confidence_entries(threshold=0.7)` returns all entries below a
+threshold, sorted ascending — future refinement-pass fuel.

 ### 5.4 Why one-file-per-entry instead of JSONL

-Random access by path. The dir loop calls `cache.has_entry("dir", path)`
-once per directory during the `_get_child_summaries()` lookup; with
-sha256-keyed files this is an `os.path.exists()` call. With JSONL it
-would be a full file scan.
+Random access by path. The dir loop calls
+`cache.has_entry("dir", path)` once per directory during the
+`_get_child_summaries()` lookup; with sha256-keyed files this is an
+`os.path.exists()` call. With JSONL it would be a full file scan.
+
+### 5.5 The planning files
+
+`plan.json` is written by `_run_investigation()` after a successful
+planning pass, so resumed runs can skip the planner. It is loaded
+before the dir loops run when `--fresh` is not set and the file
+exists.
+
+`plan_evaluation.json` is written by `_write_plan_evaluation()` after
+the dir loops finish. Schema: `plan_order`, `total_dirs_investigated`,
+`total_turns_allocated`, `total_turns_used`, `overall_utilization`,
+`per_directory` (list of `{dir, planned_tier, turns_allocated,
+turns_used, utilization, completeness, confidence}`), `evaluated_at`.
+See [Planning Pass](PlanningPass) for how to use it.

 ---

 ## 6. Prompts

-All prompt templates live in `luminos_lib/prompts.py`. There are three:
+All prompt templates live in `luminos_lib/prompts.py`. There are four:

 | Constant | Used by | What it carries |
 |---|---|---|
 | `_SURVEY_SYSTEM_PROMPT` | `_run_survey` | survey_signals, tree_preview, available_tools |
+| `_PLANNING_SYSTEM_PROMPT` | `_run_planning` | survey, tree, file signals, cached_dirs |
 | `_DIR_SYSTEM_PROMPT` | `_run_dir_loop` | dir_path, dir_rel, max_turns, context, child_summaries, survey_context |
 | `_SYNTHESIS_SYSTEM_PROMPT` | `_run_synthesis` | target, summaries_text |

@ -424,8 +486,8 @@ that reason.

 ## 7. Synthesis pass

-`_run_synthesis()` (`ai.py:1157`) is structurally similar to the dir loop
-but much simpler:
+`_run_synthesis()` is structurally similar to the dir loop but much
+simpler:

 - Reads all `dir` cache entries via `cache.read_all_entries("dir")`
 - Renders them into a `summaries_text` block (one section per dir)
@ -434,31 +496,29 @@ but much simpler:
  `detailed` fields

 Tools available: `read_cache`, `list_cache`, `flag`, `submit_report`
-(`_SYNTHESIS_TOOLS` at `ai.py:401`). The synthesis agent can pull
-specific cache entries back if it needs to drill in, but it cannot read
-files directly — synthesis is meant to operate on summaries, not raw
-contents.
+(`_SYNTHESIS_TOOLS`). The synthesis agent can pull specific cache
+entries back if it needs to drill in, but it cannot read files directly
+— synthesis is meant to operate on summaries, not raw contents.

 There's a fallback: if synthesis runs out of turns without calling
-`submit_report`, `_synthesize_from_cache()` (`ai.py:1262`) builds a
-mechanical brief+detailed from the cached dir summaries with no AI call.
-This guarantees you always get *something* in the report.
+`submit_report`, `_synthesize_from_cache()` builds a mechanical
+brief+detailed from the cached dir summaries with no AI call. This
+guarantees you always get *something* in the report.

 ---

 ## 8. Flags

 The `flag` tool is the agent's pressure valve for "I noticed something
-that should not be lost in the summary." `_tool_flag()` (`ai.py:629`)
-prints to stderr *and* appends a JSONL line to
-`{cache.root}/flags.jsonl`. At the end of `_run_investigation()`
-(`ai.py:1387–1397`), the orchestrator reads that file back and includes
-the flags in its return tuple. `format_report()` then renders them in a
-dedicated section.
+that should not be lost in the summary." `_tool_flag()` prints to stderr
+*and* appends a JSONL line to `{cache.root}/flags.jsonl`. At the end of
+`_run_investigation()`, the orchestrator reads that file back and
+includes the flags in its return tuple. `format_report()` then renders
+them in a dedicated section.

 Severity is `info | concern | critical`. The agent is told to flag
-*immediately* on discovery, not save findings for the report — this is in
-the tool description at `ai.py:312`.
+*immediately* on discovery, not save findings for the report — this is
+in the tool description.

 ---

@ -484,10 +544,11 @@ A cookbook for the kinds of changes that come up most often.
   contains your handler and `_DIR_TOOLS` contains your schema after
   importing `luminos_lib.ai`.

-To make a tool available in synthesis or survey instead of (or in
-addition to) dir, pass `scopes=["synthesis"]`, `scopes=["survey"]`, or
-`scopes=["dir", "synthesis"]`. Tools whose schema differs by scope (like
-`submit_report`) get a separate `register_tool()` call per scope.
+To make a tool available in synthesis, survey, or planning instead of
+(or in addition to) dir, pass `scopes=["synthesis"]`, `scopes=["survey"]`,
+`scopes=["planning"]`, or any combination. Tools whose schema differs by
+scope (like `submit_report`) get a separate `register_tool()` call per
+scope.

 ### 9.2 Add a whole new pass

@ -522,8 +583,7 @@ unless you `--fresh`.
 ### 9.4 Change cache schema

 1. Update the required-fields set in `cache.py:write_entry()`
-   (`cache.py:119–123`)
-2. Update `_DIR_TOOLS`'s `write_cache` description in `ai.py:228` so the
+2. Update `_DIR_TOOLS`'s `write_cache` description in `ai.py` so the
   agent knows what to write
 3. Update `_DIR_SYSTEM_PROMPT` in `prompts.py` if the agent needs to know
   *how* to populate the new field
@ -532,26 +592,25 @@ unless you `--fresh`.

 ### 9.5 Add a CLI flag

-Edit `luminos.py:88` (`main()`'s argparse setup) to define the flag, then
+Edit `luminos.py:main()`'s argparse setup to define the flag, then
 plumb it through whatever functions need it. New AI-related flags
-typically need to be added to `analyze_directory()`'s signature
-(`ai.py:1408`) and then forwarded to `_run_investigation()`.
+typically need to be added to `analyze_directory()`'s signature and
+then forwarded to `_run_investigation()`.

 ---

 ## 10. Token budget and cost

-Budget logic is in `_TokenTracker.budget_exceeded()` and is checked at the
-top of every dir loop iteration (`ai.py:882`). The budget is **per call**,
-not cumulative — see §4.4. The breach handler flushes a partial dir cache
+Budget logic is in `_TokenTracker.budget_exceeded()` and is checked at
+the top of every dir loop iteration. The budget is **per call**, not
+cumulative — see §4.4. The breach handler flushes a partial dir cache
 entry so work isn't lost.

-Cost reporting happens once at the end of `_run_investigation()`
-(`ai.py:1399`), using the cumulative `total_input` and `total_output`
-counters multiplied by the constants at `ai.py:43–44`. There is no
-running cost display during the investigation today. If you want one,
-`_TokenTracker.summary()` already returns the formatted string — just
-call it after each dir loop.
+Cost reporting happens once at the end of `_run_investigation()`, using
+the cumulative `total_input` and `total_output` counters multiplied by
+the constants near the top of `ai.py`. There is no running cost display
+during the investigation today. If you want one, `_TokenTracker.summary()`
+already returns the formatted string — just call it after each dir loop.

 ---

@ -560,16 +619,20 @@ call it after each dir loop.
 | Term | Meaning |
 |---|---|
 | **base scan** | The non-AI phase: tree, classification, languages, recency, disk usage. Stdlib + coreutils only. |
-| **dir loop** | Per-directory agent loop in `_run_dir_loop`. Up to 14 turns. Produces a `dir` cache entry. |
+| **dir loop** | Per-directory agent loop in `_run_dir_loop`. Turns allocated by the planning pass (5 shallow / 10 default / 15–20 priority, capped at 25). Produces a `dir` cache entry. |
 | **survey pass** | Single short loop before any dir loops, producing a shared description and tool guidance. |
+| **planning pass** | Phase 3 pass after the survey, before dir loops. Produces a plan (priority/shallow/skip dirs + turn allocations + order). |
 | **synthesis pass** | Final loop that reads `dir` cache entries and produces `(brief, detailed)`. |
-| **leaves-first** | Discovery order in `_discover_directories`: deepest paths first, so child summaries exist when parents are investigated. |
+| **leaves-first** | Discovery order in `_discover_directories`: deepest paths first, so child summaries exist when parents are investigated. Preserved within planning bands by `_apply_plan`. |
 | **investigation** | One end-to-end run, identified by a UUID, persisted under `/tmp/luminos/{uuid}/`. |
 | **investigation_id** | The UUID. Stored in `/tmp/luminos/investigations.json` keyed by absolute target path. |
 | **cache entry** | A JSON file under `files/` or `dirs/` named by sha256(path). |
 | **flag** | An agent finding written to `flags.jsonl` and reported separately. info / concern / critical. |
 | **partial entry** | A `dir` cache entry written when the budget tripped before `submit_report`. Marked with `partial: True`. |
+| **completeness** | Phase 3 agent self-rated thoroughness (0.0–1.0) from `submit_report`. Feeds `plan_evaluation.json`. |
 | **survey signals** | The histogram + samples computed by `filetypes.survey_signals()` during the base scan, fed to the survey prompt. |
 | **last_input** | The `input_tokens` count from the most recent API call. The basis for budget checks. NOT the cumulative sum. |
 | **CONTEXT_BUDGET** | 70% of 200k = 140k. Trigger threshold for early exit. |
 | **`_PROTECTED_DIR_TOOLS`** | Tools the survey is forbidden from filtering out of the dir loop's toolbox. Currently `{submit_report}`. |
+| **plan.json** | Serialized planning output, cached so resumed runs skip the planner. |
+| **plan_evaluation.json** | Post-investigation quality report comparing plan predictions to outcomes. |