docs: add Planning Pass design sketch, update Architecture and Internals for Phase 3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 20:32:05 -06:00 · 2026-04-12 20:32:05 -06:00 · 3fcf8c221d
commit 3fcf8c221d
parent 31a052eca0
4 changed files with 322 additions and 41 deletions
--- a/Architecture.md
+++ b/Architecture.md
@ -8,8 +8,9 @@

 Luminos is an agentic Claude investigation tool. Every invocation runs the
 full pipeline: a base scan first to feed the agent its initial picture, then
-a survey pass, then per-directory dir loops, then a final synthesis pass.
-The base scan is not a standalone product, it is the agent's input.
+a survey pass, a planning pass, per-directory dir loops with dynamic turn
+allocation, then a final synthesis pass. The base scan is not a standalone
+product, it is the agent's input.

 **Entry point:** `luminos.py` — argument parsing, scan orchestration, AI
 pipeline kickoff, output routing.
@ -69,7 +70,17 @@ analyze_directory(report, target)
            │
            ├── _filter_dir_tools(survey)  remove skip_tools (if confidence ≥ 0.5)
            │
-            ├── per-directory loop (each uncached dir, up to max_turns=14)
+            ├── _run_planning()            single loop, max 3 turns
+            │       inputs: survey output + full tree + file signals
+            │       Tools: submit_plan
+            │       output: plan dict (priority/shallow/skip dirs,
+            │               turn allocations, investigation order)
+            │       (skipped on tiny targets or loaded from plan.json
+            │        on resumed runs)
+            │
+            ├── _apply_plan()              sort dirs into bands, build turn map
+            │
+            ├── per-directory loop (ordered by plan, dynamic max_turns)
            │       _build_dir_context()    list files + sizes + MIME
            │       _get_child_summaries()  read cached child summaries
            │       _format_survey_block()  inject survey context into prompt
@ -78,7 +89,9 @@ analyze_directory(report, target)
            │                               cache entry on budget breach
            │       Tools: read_file, list_directory, run_command,
            │              parse_structure, write_cache, think, checkpoint,
-            │              flag, submit_report
+            │              flag, submit_report (with completeness)
+            │
+            ├── _write_plan_evaluation()   plan_evaluation.json quality metrics
            │
            ├── _run_synthesis()            single loop, max 5 turns
            │       reads all "dir" cache entries
@ -104,6 +117,8 @@ Layout:

 ```
 meta.json              investigation metadata
+plan.json              planning pass output (cached for resumed runs)
+plan_evaluation.json   quality metrics: plan predictions vs outcomes
 files/<sha256>.json    one JSON file per cached file entry
 dirs/<sha256>.json     one JSON file per cached directory entry
 flags.jsonl            JSONL — appended on every flag tool call
@ -170,19 +185,18 @@ the *latest* per-call `input_tokens` reading (the actual size of the
 context window in use), not the cumulative sum across turns. Early
 exit flushes partial cache on budget breach. See #44.

-**Per-loop turn cap.** Each dir loop runs for at most `max_turns = 14`
-turns. This is a sanity bound separate from the context budget — even
-on small targets the agent should produce a `submit_report` long
-before exhausting 14 turns. The cap exists to prevent runaway loops
-when the agent gets stuck (e.g. repeatedly retrying a failing tool
-call). If we observe legitimate investigations consistently hitting
-14, raise the cap; do not raise it speculatively.
+**Per-loop turn cap.** The planning pass assigns each directory a turn
+budget: priority dirs get 15-20 (capped at 25), shallow dirs get 5,
+default dirs get 10. This replaced the old fixed `max_turns=14`. The
+cap exists to prevent runaway loops when the agent gets stuck. The
+`plan_evaluation.json` quality report tracks turns used vs allocated
+per directory. See [Planning Pass](PlanningPass) for the full design.

 **Per-loop message history growth.** Tool results are appended to the
 message history and never evicted, so per-turn `input_tokens` grows
-roughly linearly across a loop (~1.5–2k per turn observed on
-codebase targets). At the current `max_turns=14` cap this stays well
-under 200k. Raising `max_turns` significantly (e.g. via Phase 3
-dynamic turn allocation) would expose this — see #51.
+roughly linearly across a loop (~1.5-2k per turn observed on
+codebase targets). At the current caps (max 25 turns for priority
+dirs) this stays under 200k. Raising caps significantly would
+expose this further. See #51.

 Pricing tracked and reported at end of each run.
--- a/Home.md
+++ b/Home.md
@ -10,9 +10,9 @@ runs first to feed the agent its initial picture of the target.

 ## Current State

- **Phase:** Active development — core pipeline stable, scaling and domain intelligence planned
- **Last worked on:** 2026-04-06
- **Last commit:** merge: add -x/--exclude flag for directory exclusion
+- **Phase:** Active development — Phases 1-3 complete. Phase 3 added planning pass with dynamic turn allocation and quality instrumentation.
+- **Last worked on:** 2026-04-12
+- **Last commit:** feat(ai): Phase 3 investigation planning (#75)
 - **Blocking:** None

 ---
@ -23,6 +23,7 @@ runs first to feed the agent its initial picture of the target.
 |---|---|
 | [Architecture](Architecture) | Module breakdown, data flow, AI pipeline |
 | [Internals](Internals) | Code-level tour: dir loop, cache, prompts, where to make changes |
+| [Planning Pass](PlanningPass) | Phase 3 design sketch: dynamic turn allocation, quality metrics |
 | [Development Guide](DevelopmentGuide) | Setup, git workflow, testing, commands |
 | [Roadmap](Roadmap) | Phase status — pointer to PLAN.md and open issues |
 | [Session Retrospectives](SessionRetrospectives) | Full session history |
--- a/Internals.md
+++ b/Internals.md
@ -313,32 +313,26 @@ is the entire payoff of leaves-first ordering.

 The trick: those subdirectory summaries only exist if the children
 were investigated *first*. If `src/` runs before `src/auth/`, the
-cache lookup at `ai.py:825` returns nothing. The function falls
-through to its default at `ai.py:832` and returns the string
-`(none — this is a leaf directory)`. The parent's system prompt
-silently loses all of its child context, and the agent has no way to
-know — the placeholder claims the dir is a leaf, which is a lie when
-the children just haven't been investigated yet. The dir summary
-degrades and the synthesis pass inherits the degradation.
+cache lookup returns nothing.

-**If you change the investigation order**, you have to do one of:
+**Phase 3 addressed this contract in two ways:**

-1. **Preserve the leaf-first invariant within whatever new order you
-   introduce.** A "priority-first" order can still process directories
-   leaves-first within each priority band, so children always run
-   before parents.
-2. **Explicitly handle the missing-child-summaries case in the
-   prompt.** Replace the lie ("leaf directory") with the truth
-   ("children not yet investigated") so the agent at least knows what
-   it doesn't have, and accept that some dirs will run with degraded
-   context.
+1. **Band-sorted ordering preserves leaf-first within priority bands.**
+   `_apply_plan()` groups directories into priority/default/shallow
+   bands but keeps the leaf-first sort within each band. So children
+   always run before their parents, even in "priority-first" mode.

-Phase 3's planning pass introduces the temptation to investigate
-priority dirs first. Both alternatives above are open. Whichever is
-chosen, this contract has to be addressed *explicitly* — the test
-class `TestDiscoverDirectories` (in `tests/test_ai_pure.py`) pins the
-current ordering, so any change will be loud, but the *reason* the
-ordering matters lives here.
+2. **The placeholder was fixed.** `_get_child_summaries()` now
+   distinguishes actual leaf directories ("this is a leaf directory")
+   from parents whose children haven't been investigated yet ("child
+   directories exist but have not been investigated yet"). The old
+   placeholder claimed every empty-cache case was a leaf, which was a
+   lie when children simply hadn't been processed yet.
+
+The test class `TestDiscoverDirectories` (in `tests/test_ai_pure.py`)
+pins the base leaf-first ordering. `TestGetChildSummaries` pins the
+updated placeholder behavior. See [Planning Pass](PlanningPass) for
+the full design.

 ---

--- a/PlanningPass.md
+++ b/PlanningPass.md
@ -0,0 +1,272 @@
+# Planning Pass Design Sketch
+
+The planning pass is Phase 3 of the Luminos investigation pipeline. It
+runs after the survey and before the per-directory dir loops, deciding
+where to invest investigative depth across the directory tree.
+
+---
+
+## Problem
+
+Before Phase 3, every directory received the same fixed allocation:
+`max_turns=14`. A two-file docs directory got the same budget as a
+fifty-file core source directory. This wasted turns on trivial dirs and
+under-invested in complex ones.
+
+---
+
+## Solution: Plan Before You Investigate
+
+A single-turn Claude call (the "planning pass") examines cheap signals
+(survey output, full directory tree, file statistics) and produces a
+structured plan that the orchestrator uses to allocate resources.
+
+```
+survey pass
+    |  survey dict
+    v
+planning pass    <-- NEW
+    |  plan dict (priority/shallow/skip dirs, turn allocations)
+    v
+dir loop (per directory, ordered by plan)
+    |  cached dir entries
+    v
+synthesis pass
+```
+
+The planning pass does not read files or explore the filesystem. It is
+a "strategy from the map" pass: it looks at structure and makes
+judgment calls about where depth will pay off.
+
+---
+
+## Plan Schema
+
+The planning agent produces a plan via the `submit_plan` tool:
+
+```python
+{
+    "priority_dirs": [
+        {"path": str, "reason": str, "suggested_turns": int}
+    ],
+    "shallow_dirs": [
+        {"path": str, "reason": str}
+    ],
+    "skip_dirs": [
+        {"path": str, "reason": str}
+    ],
+    "investigation_order": "leaf-first" | "priority-first",
+    "notes": str,
+}
+```
+
+Directories not mentioned in any tier receive a default allocation
+(currently 10 turns). The planner does not need to list every
+directory; it focuses on cases where the default would clearly be
+wrong.
+
+---
+
+## Turn Allocation
+
+| Tier | Turns | When to use |
+|---|---|---|
+| **priority** | 15-20 (capped at 25) | Complex, central, or important dirs: many source files, core logic, schemas, migrations |
+| **default** | 10 | Unlisted dirs; reasonable for most directories |
+| **shallow** | 5 | Simple, peripheral, or predictable: few files, test fixtures, static assets, docs-only |
+| **skip** | 0 (excluded) | Build output, dependency caches, vendored code, generated artifacts |
+
+The global turn budget is `base_turns_per_dir * dir_count` (10 per
+dir). The planner's allocations should roughly respect this budget.
+Allocations above the ceiling (25 turns) are capped by the
+orchestrator.
+
+### Why no mid-loop borrowing (yet)
+
+PLAN.md envisions a global budget with mid-loop turn borrowing (an
+agent that needs more turns can "borrow" from the remaining budget).
+This requires inter-loop communication that does not exist today. The
+v1 implementation uses simple per-directory allocation with no
+borrowing. If the quality instrumentation shows that priority dirs
+consistently exhaust their allocation while shallow dirs finish early,
+borrowing becomes worth building.
+
+---
+
+## Investigation Order
+
+Two strategies are available:
+
+**leaf-first** (default): the existing order from `_discover_directories()`.
+Deepest directories first, parents last. Ensures child summaries are
+always cached before parent investigation begins.
+
+**priority-first**: priority directories before shallow/default, but
+leaf-first *within each band*. This preserves the child-summaries
+invariant while letting high-value subtrees inform the rest of the
+investigation.
+
+Both strategies preserve the leaf-first contract documented in
+[Internals](Internals) section 4.7. The `_apply_plan()` function sorts
+directories into bands without breaking the within-band leaf ordering.
+
+---
+
+## Inputs to the Planner
+
+The planning agent receives four signals:
+
+1. **Survey output**: the full survey dict (description, approach,
+   domain notes, tool recommendations), formatted as a text block.
+2. **Full directory tree**: `render_tree()` output at depth 6 (deeper
+   than the survey's 2-level preview).
+3. **File signals**: extension histogram, `file --brief` descriptions,
+   filename samples (the same raw signals the survey sees).
+4. **Cached directories**: which dirs are already cached from a prior
+   run (so the planner knows what will be skipped).
+
+---
+
+## Fallback Behavior
+
+The planning pass degrades gracefully:
+
+- **Small targets** (below `_SURVEY_MIN_FILES` and `_SURVEY_MIN_DIRS`):
+  planning is skipped entirely, same threshold as the survey. All dirs
+  get the default allocation in leaf-first order.
+- **Planning fails** (API error, agent doesn't call `submit_plan`):
+  `_default_plan()` returns an empty plan. All dirs get 10 turns,
+  leaf-first order. The investigation proceeds as if Phase 3 didn't
+  exist.
+- **Resumed runs**: the plan is cached as `plan.json` in the
+  investigation cache. On resume (without `--fresh`), the cached plan
+  is loaded and `_run_planning()` is skipped.
+
+---
+
+## Quality Instrumentation
+
+Phase 3 ships with built-in measurement so we can tell whether planning
+actually improves investigation quality. Three metrics:
+
+### Turn utilization
+
+Tracked per directory: turns allocated vs turns used. An agent that
+finishes in 3 turns on an 18-turn budget suggests over-allocation. An
+agent that hits the cap on a 5-turn budget suggests under-allocation.
+
+### Completeness self-rating
+
+The `submit_report` tool (dir scope) now includes a `completeness`
+field (0.0-1.0). The agent rates how thoroughly it investigated the
+directory. This is not perfectly reliable (it is a self-assessment),
+but it provides signal: a priority dir with completeness 0.3 probably
+needed more turns; a shallow dir with completeness 0.95 probably
+didn't need its 5 turns.
+
+### plan_evaluation.json
+
+Written at the end of every investigation, this file is the planning
+pass's report card. It compares plan predictions to outcomes:
+
+```json
+{
+    "plan_order": "leaf-first",
+    "total_dirs_investigated": 12,
+    "total_turns_allocated": 120,
+    "total_turns_used": 87,
+    "overall_utilization": 0.73,
+    "per_directory": [
+        {
+            "dir": "src/core",
+            "planned_tier": "priority",
+            "turns_allocated": 18,
+            "turns_used": 14,
+            "utilization": 0.78,
+            "completeness": 0.9,
+            "confidence": 0.85
+        }
+    ],
+    "evaluated_at": "2026-04-12T..."
+}
+```
+
+Run luminos on the same target before and after changes to compare
+these metrics. The golden set for baseline comparison: luminos itself.
+
+---
+
+## Implementation Map
+
+| Component | Location | Purpose |
+|---|---|---|
+| `_PLANNING_SYSTEM_PROMPT` | `prompts.py` | System prompt for the planning agent |
+| `submit_plan` tool | `ai.py` (planning scope) | Tool schema for plan submission |
+| `_run_planning()` | `ai.py` | Runs the planning pass (follows `_run_survey` pattern) |
+| `_apply_plan()` | `ai.py` | Pure function: plan + dir list to ordered list + turn map |
+| `_default_plan()` | `ai.py` | Fallback empty plan |
+| `_write_plan_evaluation()` | `ai.py` | Writes `plan_evaluation.json` after dir loops |
+| `_TokenTracker._loop_turns` | `ai.py` | Counts API calls per dir loop for utilization tracking |
+| `plan.json` | cache root | Persisted plan for resumed runs |
+| `plan_evaluation.json` | cache root | Post-investigation quality report |
+
+---
+
+## Design Decisions
+
+### Why band-sorted order instead of arbitrary reordering
+
+The leaf-first contract (`_get_child_summaries()`) is load-bearing.
+Breaking it silently degrades parent summaries because child cache
+entries don't exist yet. Band-sorting preserves leaf-first within each
+priority band, giving us "priority-first" without losing child context.
+
+### Why per-directory allocation instead of a shared global pool
+
+A shared pool with mid-loop borrowing requires the orchestrator to
+communicate with running agents, which doesn't exist in the current
+architecture (each `_run_dir_loop` call is independent). Per-directory
+allocation is a strict improvement over fixed-14-for-everyone with zero
+new machinery. The quality instrumentation will tell us if borrowing is
+worth building.
+
+### Why the child-summaries placeholder was fixed
+
+`_get_child_summaries()` previously returned "this is a leaf directory"
+for any directory with no cached children, whether it was actually a
+leaf or just hadn't been investigated yet. With priority-first ordering,
+this lie becomes more likely to trigger. The fix distinguishes the two
+cases: actual leaves get "this is a leaf directory", uninvestigated
+parents get "child directories exist but have not been investigated
+yet".
+
+### Why completeness is a self-rating
+
+An external completeness metric would require knowing "how many files
+should have been examined", which depends on the directory contents and
+is exactly the kind of judgment the agent makes. Self-rating is
+imperfect but cheap, and the correlation between self-rated
+completeness and turn utilization gives us a useful signal even if the
+absolute values aren't perfectly calibrated.
+
+---
+
+## Future Work
+
+- **Mid-loop turn borrowing**: if utilization data shows priority dirs
+  consistently hit their cap while others finish early, implement a
+  shared budget pool.
+- **Plan refinement**: after the first dir loop run, re-evaluate the
+  plan based on early findings (some "shallow" dirs might turn out to
+  be important).
+- **Cross-run learning**: use `plan_evaluation.json` from prior runs to
+  improve planning on similar targets.
+
+---
+
+## References
+
+- Issues: #8, #9, #10, #11, #74
+- PR: #75
+- PLAN.md Part 4: Investigation Planning
+- [Internals](Internals) section 4.7: leaf-first contract