1 PlanningPass
Jeff Smith edited this page 2026-04-12 20:32:05 -06:00

Planning Pass Design Sketch

The planning pass is Phase 3 of the Luminos investigation pipeline. It runs after the survey and before the per-directory dir loops, deciding where to invest investigative depth across the directory tree.


Problem

Before Phase 3, every directory received the same fixed allocation: max_turns=14. A two-file docs directory got the same budget as a fifty-file core source directory. This wasted turns on trivial dirs and under-invested in complex ones.


Solution: Plan Before You Investigate

A single-turn Claude call (the "planning pass") examines cheap signals (survey output, full directory tree, file statistics) and produces a structured plan that the orchestrator uses to allocate resources.

survey pass
    |  survey dict
    v
planning pass    <-- NEW
    |  plan dict (priority/shallow/skip dirs, turn allocations)
    v
dir loop (per directory, ordered by plan)
    |  cached dir entries
    v
synthesis pass

The planning pass does not read files or explore the filesystem. It is a "strategy from the map" pass: it looks at structure and makes judgment calls about where depth will pay off.


Plan Schema

The planning agent produces a plan via the submit_plan tool:

{
    "priority_dirs": [
        {"path": str, "reason": str, "suggested_turns": int}
    ],
    "shallow_dirs": [
        {"path": str, "reason": str}
    ],
    "skip_dirs": [
        {"path": str, "reason": str}
    ],
    "investigation_order": "leaf-first" | "priority-first",
    "notes": str,
}

Directories not mentioned in any tier receive a default allocation (currently 10 turns). The planner does not need to list every directory; it focuses on cases where the default would clearly be wrong.


Turn Allocation

Tier Turns When to use
priority 15-20 (capped at 25) Complex, central, or important dirs: many source files, core logic, schemas, migrations
default 10 Unlisted dirs; reasonable for most directories
shallow 5 Simple, peripheral, or predictable: few files, test fixtures, static assets, docs-only
skip 0 (excluded) Build output, dependency caches, vendored code, generated artifacts

The global turn budget is base_turns_per_dir * dir_count (10 per dir). The planner's allocations should roughly respect this budget. Allocations above the ceiling (25 turns) are capped by the orchestrator.

Why no mid-loop borrowing (yet)

PLAN.md envisions a global budget with mid-loop turn borrowing (an agent that needs more turns can "borrow" from the remaining budget). This requires inter-loop communication that does not exist today. The v1 implementation uses simple per-directory allocation with no borrowing. If the quality instrumentation shows that priority dirs consistently exhaust their allocation while shallow dirs finish early, borrowing becomes worth building.


Investigation Order

Two strategies are available:

leaf-first (default): the existing order from _discover_directories(). Deepest directories first, parents last. Ensures child summaries are always cached before parent investigation begins.

priority-first: priority directories before shallow/default, but leaf-first within each band. This preserves the child-summaries invariant while letting high-value subtrees inform the rest of the investigation.

Both strategies preserve the leaf-first contract documented in Internals section 4.7. The _apply_plan() function sorts directories into bands without breaking the within-band leaf ordering.


Inputs to the Planner

The planning agent receives four signals:

  1. Survey output: the full survey dict (description, approach, domain notes, tool recommendations), formatted as a text block.
  2. Full directory tree: render_tree() output at depth 6 (deeper than the survey's 2-level preview).
  3. File signals: extension histogram, file --brief descriptions, filename samples (the same raw signals the survey sees).
  4. Cached directories: which dirs are already cached from a prior run (so the planner knows what will be skipped).

Fallback Behavior

The planning pass degrades gracefully:

  • Small targets (below _SURVEY_MIN_FILES and _SURVEY_MIN_DIRS): planning is skipped entirely, same threshold as the survey. All dirs get the default allocation in leaf-first order.
  • Planning fails (API error, agent doesn't call submit_plan): _default_plan() returns an empty plan. All dirs get 10 turns, leaf-first order. The investigation proceeds as if Phase 3 didn't exist.
  • Resumed runs: the plan is cached as plan.json in the investigation cache. On resume (without --fresh), the cached plan is loaded and _run_planning() is skipped.

Quality Instrumentation

Phase 3 ships with built-in measurement so we can tell whether planning actually improves investigation quality. Three metrics:

Turn utilization

Tracked per directory: turns allocated vs turns used. An agent that finishes in 3 turns on an 18-turn budget suggests over-allocation. An agent that hits the cap on a 5-turn budget suggests under-allocation.

Completeness self-rating

The submit_report tool (dir scope) now includes a completeness field (0.0-1.0). The agent rates how thoroughly it investigated the directory. This is not perfectly reliable (it is a self-assessment), but it provides signal: a priority dir with completeness 0.3 probably needed more turns; a shallow dir with completeness 0.95 probably didn't need its 5 turns.

plan_evaluation.json

Written at the end of every investigation, this file is the planning pass's report card. It compares plan predictions to outcomes:

{
    "plan_order": "leaf-first",
    "total_dirs_investigated": 12,
    "total_turns_allocated": 120,
    "total_turns_used": 87,
    "overall_utilization": 0.73,
    "per_directory": [
        {
            "dir": "src/core",
            "planned_tier": "priority",
            "turns_allocated": 18,
            "turns_used": 14,
            "utilization": 0.78,
            "completeness": 0.9,
            "confidence": 0.85
        }
    ],
    "evaluated_at": "2026-04-12T..."
}

Run luminos on the same target before and after changes to compare these metrics. The golden set for baseline comparison: luminos itself.


Implementation Map

Component Location Purpose
_PLANNING_SYSTEM_PROMPT prompts.py System prompt for the planning agent
submit_plan tool ai.py (planning scope) Tool schema for plan submission
_run_planning() ai.py Runs the planning pass (follows _run_survey pattern)
_apply_plan() ai.py Pure function: plan + dir list to ordered list + turn map
_default_plan() ai.py Fallback empty plan
_write_plan_evaluation() ai.py Writes plan_evaluation.json after dir loops
_TokenTracker._loop_turns ai.py Counts API calls per dir loop for utilization tracking
plan.json cache root Persisted plan for resumed runs
plan_evaluation.json cache root Post-investigation quality report

Design Decisions

Why band-sorted order instead of arbitrary reordering

The leaf-first contract (_get_child_summaries()) is load-bearing. Breaking it silently degrades parent summaries because child cache entries don't exist yet. Band-sorting preserves leaf-first within each priority band, giving us "priority-first" without losing child context.

Why per-directory allocation instead of a shared global pool

A shared pool with mid-loop borrowing requires the orchestrator to communicate with running agents, which doesn't exist in the current architecture (each _run_dir_loop call is independent). Per-directory allocation is a strict improvement over fixed-14-for-everyone with zero new machinery. The quality instrumentation will tell us if borrowing is worth building.

Why the child-summaries placeholder was fixed

_get_child_summaries() previously returned "this is a leaf directory" for any directory with no cached children, whether it was actually a leaf or just hadn't been investigated yet. With priority-first ordering, this lie becomes more likely to trigger. The fix distinguishes the two cases: actual leaves get "this is a leaf directory", uninvestigated parents get "child directories exist but have not been investigated yet".

Why completeness is a self-rating

An external completeness metric would require knowing "how many files should have been examined", which depends on the directory contents and is exactly the kind of judgment the agent makes. Self-rating is imperfect but cheap, and the correlation between self-rated completeness and turn utilization gives us a useful signal even if the absolute values aren't perfectly calibrated.


Future Work

  • Mid-loop turn borrowing: if utilization data shows priority dirs consistently hit their cap while others finish early, implement a shared budget pool.
  • Plan refinement: after the first dir loop run, re-evaluate the plan based on early findings (some "shallow" dirs might turn out to be important).
  • Cross-run learning: use plan_evaluation.json from prior runs to improve planning on similar targets.

References