Table of Contents
- Planning Pass Design Sketch
- Problem
- Solution: Plan Before You Investigate
- Plan Schema
- Turn Allocation
- Investigation Order
- Inputs to the Planner
- Fallback Behavior
- Quality Instrumentation
- Implementation Map
- Design Decisions
- Why band-sorted order instead of arbitrary reordering
- Why per-directory allocation instead of a shared global pool
- Why the child-summaries placeholder was fixed
- Why completeness is a self-rating
- Future Work
- References
Planning Pass Design Sketch
The planning pass is Phase 3 of the Luminos investigation pipeline. It runs after the survey and before the per-directory dir loops, deciding where to invest investigative depth across the directory tree.
Problem
Before Phase 3, every directory received the same fixed allocation:
max_turns=14. A two-file docs directory got the same budget as a
fifty-file core source directory. This wasted turns on trivial dirs and
under-invested in complex ones.
Solution: Plan Before You Investigate
A single-turn Claude call (the "planning pass") examines cheap signals (survey output, full directory tree, file statistics) and produces a structured plan that the orchestrator uses to allocate resources.
survey pass
| survey dict
v
planning pass <-- NEW
| plan dict (priority/shallow/skip dirs, turn allocations)
v
dir loop (per directory, ordered by plan)
| cached dir entries
v
synthesis pass
The planning pass does not read files or explore the filesystem. It is a "strategy from the map" pass: it looks at structure and makes judgment calls about where depth will pay off.
Plan Schema
The planning agent produces a plan via the submit_plan tool:
{
"priority_dirs": [
{"path": str, "reason": str, "suggested_turns": int}
],
"shallow_dirs": [
{"path": str, "reason": str}
],
"skip_dirs": [
{"path": str, "reason": str}
],
"investigation_order": "leaf-first" | "priority-first",
"notes": str,
}
Directories not mentioned in any tier receive a default allocation (currently 10 turns). The planner does not need to list every directory; it focuses on cases where the default would clearly be wrong.
Turn Allocation
| Tier | Turns | When to use |
|---|---|---|
| priority | 15-20 (capped at 25) | Complex, central, or important dirs: many source files, core logic, schemas, migrations |
| default | 10 | Unlisted dirs; reasonable for most directories |
| shallow | 5 | Simple, peripheral, or predictable: few files, test fixtures, static assets, docs-only |
| skip | 0 (excluded) | Build output, dependency caches, vendored code, generated artifacts |
The global turn budget is base_turns_per_dir * dir_count (10 per
dir). The planner's allocations should roughly respect this budget.
Allocations above the ceiling (25 turns) are capped by the
orchestrator.
Why no mid-loop borrowing (yet)
PLAN.md envisions a global budget with mid-loop turn borrowing (an agent that needs more turns can "borrow" from the remaining budget). This requires inter-loop communication that does not exist today. The v1 implementation uses simple per-directory allocation with no borrowing. If the quality instrumentation shows that priority dirs consistently exhaust their allocation while shallow dirs finish early, borrowing becomes worth building.
Investigation Order
Two strategies are available:
leaf-first (default): the existing order from _discover_directories().
Deepest directories first, parents last. Ensures child summaries are
always cached before parent investigation begins.
priority-first: priority directories before shallow/default, but leaf-first within each band. This preserves the child-summaries invariant while letting high-value subtrees inform the rest of the investigation.
Both strategies preserve the leaf-first contract documented in
Internals section 4.7. The _apply_plan() function sorts
directories into bands without breaking the within-band leaf ordering.
Inputs to the Planner
The planning agent receives four signals:
- Survey output: the full survey dict (description, approach, domain notes, tool recommendations), formatted as a text block.
- Full directory tree:
render_tree()output at depth 6 (deeper than the survey's 2-level preview). - File signals: extension histogram,
file --briefdescriptions, filename samples (the same raw signals the survey sees). - Cached directories: which dirs are already cached from a prior run (so the planner knows what will be skipped).
Fallback Behavior
The planning pass degrades gracefully:
- Small targets (below
_SURVEY_MIN_FILESand_SURVEY_MIN_DIRS): planning is skipped entirely, same threshold as the survey. All dirs get the default allocation in leaf-first order. - Planning fails (API error, agent doesn't call
submit_plan):_default_plan()returns an empty plan. All dirs get 10 turns, leaf-first order. The investigation proceeds as if Phase 3 didn't exist. - Resumed runs: the plan is cached as
plan.jsonin the investigation cache. On resume (without--fresh), the cached plan is loaded and_run_planning()is skipped.
Quality Instrumentation
Phase 3 ships with built-in measurement so we can tell whether planning actually improves investigation quality. Three metrics:
Turn utilization
Tracked per directory: turns allocated vs turns used. An agent that finishes in 3 turns on an 18-turn budget suggests over-allocation. An agent that hits the cap on a 5-turn budget suggests under-allocation.
Completeness self-rating
The submit_report tool (dir scope) now includes a completeness
field (0.0-1.0). The agent rates how thoroughly it investigated the
directory. This is not perfectly reliable (it is a self-assessment),
but it provides signal: a priority dir with completeness 0.3 probably
needed more turns; a shallow dir with completeness 0.95 probably
didn't need its 5 turns.
plan_evaluation.json
Written at the end of every investigation, this file is the planning pass's report card. It compares plan predictions to outcomes:
{
"plan_order": "leaf-first",
"total_dirs_investigated": 12,
"total_turns_allocated": 120,
"total_turns_used": 87,
"overall_utilization": 0.73,
"per_directory": [
{
"dir": "src/core",
"planned_tier": "priority",
"turns_allocated": 18,
"turns_used": 14,
"utilization": 0.78,
"completeness": 0.9,
"confidence": 0.85
}
],
"evaluated_at": "2026-04-12T..."
}
Run luminos on the same target before and after changes to compare these metrics. The golden set for baseline comparison: luminos itself.
Implementation Map
| Component | Location | Purpose |
|---|---|---|
_PLANNING_SYSTEM_PROMPT |
prompts.py |
System prompt for the planning agent |
submit_plan tool |
ai.py (planning scope) |
Tool schema for plan submission |
_run_planning() |
ai.py |
Runs the planning pass (follows _run_survey pattern) |
_apply_plan() |
ai.py |
Pure function: plan + dir list to ordered list + turn map |
_default_plan() |
ai.py |
Fallback empty plan |
_write_plan_evaluation() |
ai.py |
Writes plan_evaluation.json after dir loops |
_TokenTracker._loop_turns |
ai.py |
Counts API calls per dir loop for utilization tracking |
plan.json |
cache root | Persisted plan for resumed runs |
plan_evaluation.json |
cache root | Post-investigation quality report |
Design Decisions
Why band-sorted order instead of arbitrary reordering
The leaf-first contract (_get_child_summaries()) is load-bearing.
Breaking it silently degrades parent summaries because child cache
entries don't exist yet. Band-sorting preserves leaf-first within each
priority band, giving us "priority-first" without losing child context.
Why per-directory allocation instead of a shared global pool
A shared pool with mid-loop borrowing requires the orchestrator to
communicate with running agents, which doesn't exist in the current
architecture (each _run_dir_loop call is independent). Per-directory
allocation is a strict improvement over fixed-14-for-everyone with zero
new machinery. The quality instrumentation will tell us if borrowing is
worth building.
Why the child-summaries placeholder was fixed
_get_child_summaries() previously returned "this is a leaf directory"
for any directory with no cached children, whether it was actually a
leaf or just hadn't been investigated yet. With priority-first ordering,
this lie becomes more likely to trigger. The fix distinguishes the two
cases: actual leaves get "this is a leaf directory", uninvestigated
parents get "child directories exist but have not been investigated
yet".
Why completeness is a self-rating
An external completeness metric would require knowing "how many files should have been examined", which depends on the directory contents and is exactly the kind of judgment the agent makes. Self-rating is imperfect but cheap, and the correlation between self-rated completeness and turn utilization gives us a useful signal even if the absolute values aren't perfectly calibrated.
Future Work
- Mid-loop turn borrowing: if utilization data shows priority dirs consistently hit their cap while others finish early, implement a shared budget pool.
- Plan refinement: after the first dir loop run, re-evaluate the plan based on early findings (some "shallow" dirs might turn out to be important).
- Cross-run learning: use
plan_evaluation.jsonfrom prior runs to improve planning on similar targets.