docs: add Planning Pass design sketch, update Architecture and Internals for Phase 3
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
parent
31a052eca0
commit
3fcf8c221d
4 changed files with 322 additions and 41 deletions
|
|
@ -8,8 +8,9 @@
|
|||
|
||||
Luminos is an agentic Claude investigation tool. Every invocation runs the
|
||||
full pipeline: a base scan first to feed the agent its initial picture, then
|
||||
a survey pass, then per-directory dir loops, then a final synthesis pass.
|
||||
The base scan is not a standalone product, it is the agent's input.
|
||||
a survey pass, a planning pass, per-directory dir loops with dynamic turn
|
||||
allocation, then a final synthesis pass. The base scan is not a standalone
|
||||
product, it is the agent's input.
|
||||
|
||||
**Entry point:** `luminos.py` — argument parsing, scan orchestration, AI
|
||||
pipeline kickoff, output routing.
|
||||
|
|
@ -69,7 +70,17 @@ analyze_directory(report, target)
|
|||
│
|
||||
├── _filter_dir_tools(survey) remove skip_tools (if confidence ≥ 0.5)
|
||||
│
|
||||
├── per-directory loop (each uncached dir, up to max_turns=14)
|
||||
├── _run_planning() single loop, max 3 turns
|
||||
│ inputs: survey output + full tree + file signals
|
||||
│ Tools: submit_plan
|
||||
│ output: plan dict (priority/shallow/skip dirs,
|
||||
│ turn allocations, investigation order)
|
||||
│ (skipped on tiny targets or loaded from plan.json
|
||||
│ on resumed runs)
|
||||
│
|
||||
├── _apply_plan() sort dirs into bands, build turn map
|
||||
│
|
||||
├── per-directory loop (ordered by plan, dynamic max_turns)
|
||||
│ _build_dir_context() list files + sizes + MIME
|
||||
│ _get_child_summaries() read cached child summaries
|
||||
│ _format_survey_block() inject survey context into prompt
|
||||
|
|
@ -78,7 +89,9 @@ analyze_directory(report, target)
|
|||
│ cache entry on budget breach
|
||||
│ Tools: read_file, list_directory, run_command,
|
||||
│ parse_structure, write_cache, think, checkpoint,
|
||||
│ flag, submit_report
|
||||
│ flag, submit_report (with completeness)
|
||||
│
|
||||
├── _write_plan_evaluation() plan_evaluation.json quality metrics
|
||||
│
|
||||
├── _run_synthesis() single loop, max 5 turns
|
||||
│ reads all "dir" cache entries
|
||||
|
|
@ -104,6 +117,8 @@ Layout:
|
|||
|
||||
```
|
||||
meta.json investigation metadata
|
||||
plan.json planning pass output (cached for resumed runs)
|
||||
plan_evaluation.json quality metrics: plan predictions vs outcomes
|
||||
files/<sha256>.json one JSON file per cached file entry
|
||||
dirs/<sha256>.json one JSON file per cached directory entry
|
||||
flags.jsonl JSONL — appended on every flag tool call
|
||||
|
|
@ -170,19 +185,18 @@ the *latest* per-call `input_tokens` reading (the actual size of the
|
|||
context window in use), not the cumulative sum across turns. Early
|
||||
exit flushes partial cache on budget breach. See #44.
|
||||
|
||||
**Per-loop turn cap.** Each dir loop runs for at most `max_turns = 14`
|
||||
turns. This is a sanity bound separate from the context budget — even
|
||||
on small targets the agent should produce a `submit_report` long
|
||||
before exhausting 14 turns. The cap exists to prevent runaway loops
|
||||
when the agent gets stuck (e.g. repeatedly retrying a failing tool
|
||||
call). If we observe legitimate investigations consistently hitting
|
||||
14, raise the cap; do not raise it speculatively.
|
||||
**Per-loop turn cap.** The planning pass assigns each directory a turn
|
||||
budget: priority dirs get 15-20 (capped at 25), shallow dirs get 5,
|
||||
default dirs get 10. This replaced the old fixed `max_turns=14`. The
|
||||
cap exists to prevent runaway loops when the agent gets stuck. The
|
||||
`plan_evaluation.json` quality report tracks turns used vs allocated
|
||||
per directory. See [Planning Pass](PlanningPass) for the full design.
|
||||
|
||||
**Per-loop message history growth.** Tool results are appended to the
|
||||
message history and never evicted, so per-turn `input_tokens` grows
|
||||
roughly linearly across a loop (~1.5–2k per turn observed on
|
||||
codebase targets). At the current `max_turns=14` cap this stays well
|
||||
under 200k. Raising `max_turns` significantly (e.g. via Phase 3
|
||||
dynamic turn allocation) would expose this — see #51.
|
||||
roughly linearly across a loop (~1.5-2k per turn observed on
|
||||
codebase targets). At the current caps (max 25 turns for priority
|
||||
dirs) this stays under 200k. Raising caps significantly would
|
||||
expose this further. See #51.
|
||||
|
||||
Pricing tracked and reported at end of each run.
|
||||
|
|
|
|||
7
Home.md
7
Home.md
|
|
@ -10,9 +10,9 @@ runs first to feed the agent its initial picture of the target.
|
|||
|
||||
## Current State
|
||||
|
||||
- **Phase:** Active development — core pipeline stable, scaling and domain intelligence planned
|
||||
- **Last worked on:** 2026-04-06
|
||||
- **Last commit:** merge: add -x/--exclude flag for directory exclusion
|
||||
- **Phase:** Active development — Phases 1-3 complete. Phase 3 added planning pass with dynamic turn allocation and quality instrumentation.
|
||||
- **Last worked on:** 2026-04-12
|
||||
- **Last commit:** feat(ai): Phase 3 investigation planning (#75)
|
||||
- **Blocking:** None
|
||||
|
||||
---
|
||||
|
|
@ -23,6 +23,7 @@ runs first to feed the agent its initial picture of the target.
|
|||
|---|---|
|
||||
| [Architecture](Architecture) | Module breakdown, data flow, AI pipeline |
|
||||
| [Internals](Internals) | Code-level tour: dir loop, cache, prompts, where to make changes |
|
||||
| [Planning Pass](PlanningPass) | Phase 3 design sketch: dynamic turn allocation, quality metrics |
|
||||
| [Development Guide](DevelopmentGuide) | Setup, git workflow, testing, commands |
|
||||
| [Roadmap](Roadmap) | Phase status — pointer to PLAN.md and open issues |
|
||||
| [Session Retrospectives](SessionRetrospectives) | Full session history |
|
||||
|
|
|
|||
40
Internals.md
40
Internals.md
|
|
@ -313,32 +313,26 @@ is the entire payoff of leaves-first ordering.
|
|||
|
||||
The trick: those subdirectory summaries only exist if the children
|
||||
were investigated *first*. If `src/` runs before `src/auth/`, the
|
||||
cache lookup at `ai.py:825` returns nothing. The function falls
|
||||
through to its default at `ai.py:832` and returns the string
|
||||
`(none — this is a leaf directory)`. The parent's system prompt
|
||||
silently loses all of its child context, and the agent has no way to
|
||||
know — the placeholder claims the dir is a leaf, which is a lie when
|
||||
the children just haven't been investigated yet. The dir summary
|
||||
degrades and the synthesis pass inherits the degradation.
|
||||
cache lookup returns nothing.
|
||||
|
||||
**If you change the investigation order**, you have to do one of:
|
||||
**Phase 3 addressed this contract in two ways:**
|
||||
|
||||
1. **Preserve the leaf-first invariant within whatever new order you
|
||||
introduce.** A "priority-first" order can still process directories
|
||||
leaves-first within each priority band, so children always run
|
||||
before parents.
|
||||
2. **Explicitly handle the missing-child-summaries case in the
|
||||
prompt.** Replace the lie ("leaf directory") with the truth
|
||||
("children not yet investigated") so the agent at least knows what
|
||||
it doesn't have, and accept that some dirs will run with degraded
|
||||
context.
|
||||
1. **Band-sorted ordering preserves leaf-first within priority bands.**
|
||||
`_apply_plan()` groups directories into priority/default/shallow
|
||||
bands but keeps the leaf-first sort within each band. So children
|
||||
always run before their parents, even in "priority-first" mode.
|
||||
|
||||
Phase 3's planning pass introduces the temptation to investigate
|
||||
priority dirs first. Both alternatives above are open. Whichever is
|
||||
chosen, this contract has to be addressed *explicitly* — the test
|
||||
class `TestDiscoverDirectories` (in `tests/test_ai_pure.py`) pins the
|
||||
current ordering, so any change will be loud, but the *reason* the
|
||||
ordering matters lives here.
|
||||
2. **The placeholder was fixed.** `_get_child_summaries()` now
|
||||
distinguishes actual leaf directories ("this is a leaf directory")
|
||||
from parents whose children haven't been investigated yet ("child
|
||||
directories exist but have not been investigated yet"). The old
|
||||
placeholder claimed every empty-cache case was a leaf, which was a
|
||||
lie when children simply hadn't been processed yet.
|
||||
|
||||
The test class `TestDiscoverDirectories` (in `tests/test_ai_pure.py`)
|
||||
pins the base leaf-first ordering. `TestGetChildSummaries` pins the
|
||||
updated placeholder behavior. See [Planning Pass](PlanningPass) for
|
||||
the full design.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
272
PlanningPass.md
Normal file
272
PlanningPass.md
Normal file
|
|
@ -0,0 +1,272 @@
|
|||
# Planning Pass Design Sketch
|
||||
|
||||
The planning pass is Phase 3 of the Luminos investigation pipeline. It
|
||||
runs after the survey and before the per-directory dir loops, deciding
|
||||
where to invest investigative depth across the directory tree.
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
Before Phase 3, every directory received the same fixed allocation:
|
||||
`max_turns=14`. A two-file docs directory got the same budget as a
|
||||
fifty-file core source directory. This wasted turns on trivial dirs and
|
||||
under-invested in complex ones.
|
||||
|
||||
---
|
||||
|
||||
## Solution: Plan Before You Investigate
|
||||
|
||||
A single-turn Claude call (the "planning pass") examines cheap signals
|
||||
(survey output, full directory tree, file statistics) and produces a
|
||||
structured plan that the orchestrator uses to allocate resources.
|
||||
|
||||
```
|
||||
survey pass
|
||||
| survey dict
|
||||
v
|
||||
planning pass <-- NEW
|
||||
| plan dict (priority/shallow/skip dirs, turn allocations)
|
||||
v
|
||||
dir loop (per directory, ordered by plan)
|
||||
| cached dir entries
|
||||
v
|
||||
synthesis pass
|
||||
```
|
||||
|
||||
The planning pass does not read files or explore the filesystem. It is
|
||||
a "strategy from the map" pass: it looks at structure and makes
|
||||
judgment calls about where depth will pay off.
|
||||
|
||||
---
|
||||
|
||||
## Plan Schema
|
||||
|
||||
The planning agent produces a plan via the `submit_plan` tool:
|
||||
|
||||
```python
|
||||
{
|
||||
"priority_dirs": [
|
||||
{"path": str, "reason": str, "suggested_turns": int}
|
||||
],
|
||||
"shallow_dirs": [
|
||||
{"path": str, "reason": str}
|
||||
],
|
||||
"skip_dirs": [
|
||||
{"path": str, "reason": str}
|
||||
],
|
||||
"investigation_order": "leaf-first" | "priority-first",
|
||||
"notes": str,
|
||||
}
|
||||
```
|
||||
|
||||
Directories not mentioned in any tier receive a default allocation
|
||||
(currently 10 turns). The planner does not need to list every
|
||||
directory; it focuses on cases where the default would clearly be
|
||||
wrong.
|
||||
|
||||
---
|
||||
|
||||
## Turn Allocation
|
||||
|
||||
| Tier | Turns | When to use |
|
||||
|---|---|---|
|
||||
| **priority** | 15-20 (capped at 25) | Complex, central, or important dirs: many source files, core logic, schemas, migrations |
|
||||
| **default** | 10 | Unlisted dirs; reasonable for most directories |
|
||||
| **shallow** | 5 | Simple, peripheral, or predictable: few files, test fixtures, static assets, docs-only |
|
||||
| **skip** | 0 (excluded) | Build output, dependency caches, vendored code, generated artifacts |
|
||||
|
||||
The global turn budget is `base_turns_per_dir * dir_count` (10 per
|
||||
dir). The planner's allocations should roughly respect this budget.
|
||||
Allocations above the ceiling (25 turns) are capped by the
|
||||
orchestrator.
|
||||
|
||||
### Why no mid-loop borrowing (yet)
|
||||
|
||||
PLAN.md envisions a global budget with mid-loop turn borrowing (an
|
||||
agent that needs more turns can "borrow" from the remaining budget).
|
||||
This requires inter-loop communication that does not exist today. The
|
||||
v1 implementation uses simple per-directory allocation with no
|
||||
borrowing. If the quality instrumentation shows that priority dirs
|
||||
consistently exhaust their allocation while shallow dirs finish early,
|
||||
borrowing becomes worth building.
|
||||
|
||||
---
|
||||
|
||||
## Investigation Order
|
||||
|
||||
Two strategies are available:
|
||||
|
||||
**leaf-first** (default): the existing order from `_discover_directories()`.
|
||||
Deepest directories first, parents last. Ensures child summaries are
|
||||
always cached before parent investigation begins.
|
||||
|
||||
**priority-first**: priority directories before shallow/default, but
|
||||
leaf-first *within each band*. This preserves the child-summaries
|
||||
invariant while letting high-value subtrees inform the rest of the
|
||||
investigation.
|
||||
|
||||
Both strategies preserve the leaf-first contract documented in
|
||||
[Internals](Internals) section 4.7. The `_apply_plan()` function sorts
|
||||
directories into bands without breaking the within-band leaf ordering.
|
||||
|
||||
---
|
||||
|
||||
## Inputs to the Planner
|
||||
|
||||
The planning agent receives four signals:
|
||||
|
||||
1. **Survey output**: the full survey dict (description, approach,
|
||||
domain notes, tool recommendations), formatted as a text block.
|
||||
2. **Full directory tree**: `render_tree()` output at depth 6 (deeper
|
||||
than the survey's 2-level preview).
|
||||
3. **File signals**: extension histogram, `file --brief` descriptions,
|
||||
filename samples (the same raw signals the survey sees).
|
||||
4. **Cached directories**: which dirs are already cached from a prior
|
||||
run (so the planner knows what will be skipped).
|
||||
|
||||
---
|
||||
|
||||
## Fallback Behavior
|
||||
|
||||
The planning pass degrades gracefully:
|
||||
|
||||
- **Small targets** (below `_SURVEY_MIN_FILES` and `_SURVEY_MIN_DIRS`):
|
||||
planning is skipped entirely, same threshold as the survey. All dirs
|
||||
get the default allocation in leaf-first order.
|
||||
- **Planning fails** (API error, agent doesn't call `submit_plan`):
|
||||
`_default_plan()` returns an empty plan. All dirs get 10 turns,
|
||||
leaf-first order. The investigation proceeds as if Phase 3 didn't
|
||||
exist.
|
||||
- **Resumed runs**: the plan is cached as `plan.json` in the
|
||||
investigation cache. On resume (without `--fresh`), the cached plan
|
||||
is loaded and `_run_planning()` is skipped.
|
||||
|
||||
---
|
||||
|
||||
## Quality Instrumentation
|
||||
|
||||
Phase 3 ships with built-in measurement so we can tell whether planning
|
||||
actually improves investigation quality. Three metrics:
|
||||
|
||||
### Turn utilization
|
||||
|
||||
Tracked per directory: turns allocated vs turns used. An agent that
|
||||
finishes in 3 turns on an 18-turn budget suggests over-allocation. An
|
||||
agent that hits the cap on a 5-turn budget suggests under-allocation.
|
||||
|
||||
### Completeness self-rating
|
||||
|
||||
The `submit_report` tool (dir scope) now includes a `completeness`
|
||||
field (0.0-1.0). The agent rates how thoroughly it investigated the
|
||||
directory. This is not perfectly reliable (it is a self-assessment),
|
||||
but it provides signal: a priority dir with completeness 0.3 probably
|
||||
needed more turns; a shallow dir with completeness 0.95 probably
|
||||
didn't need its 5 turns.
|
||||
|
||||
### plan_evaluation.json
|
||||
|
||||
Written at the end of every investigation, this file is the planning
|
||||
pass's report card. It compares plan predictions to outcomes:
|
||||
|
||||
```json
|
||||
{
|
||||
"plan_order": "leaf-first",
|
||||
"total_dirs_investigated": 12,
|
||||
"total_turns_allocated": 120,
|
||||
"total_turns_used": 87,
|
||||
"overall_utilization": 0.73,
|
||||
"per_directory": [
|
||||
{
|
||||
"dir": "src/core",
|
||||
"planned_tier": "priority",
|
||||
"turns_allocated": 18,
|
||||
"turns_used": 14,
|
||||
"utilization": 0.78,
|
||||
"completeness": 0.9,
|
||||
"confidence": 0.85
|
||||
}
|
||||
],
|
||||
"evaluated_at": "2026-04-12T..."
|
||||
}
|
||||
```
|
||||
|
||||
Run luminos on the same target before and after changes to compare
|
||||
these metrics. The golden set for baseline comparison: luminos itself.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Map
|
||||
|
||||
| Component | Location | Purpose |
|
||||
|---|---|---|
|
||||
| `_PLANNING_SYSTEM_PROMPT` | `prompts.py` | System prompt for the planning agent |
|
||||
| `submit_plan` tool | `ai.py` (planning scope) | Tool schema for plan submission |
|
||||
| `_run_planning()` | `ai.py` | Runs the planning pass (follows `_run_survey` pattern) |
|
||||
| `_apply_plan()` | `ai.py` | Pure function: plan + dir list to ordered list + turn map |
|
||||
| `_default_plan()` | `ai.py` | Fallback empty plan |
|
||||
| `_write_plan_evaluation()` | `ai.py` | Writes `plan_evaluation.json` after dir loops |
|
||||
| `_TokenTracker._loop_turns` | `ai.py` | Counts API calls per dir loop for utilization tracking |
|
||||
| `plan.json` | cache root | Persisted plan for resumed runs |
|
||||
| `plan_evaluation.json` | cache root | Post-investigation quality report |
|
||||
|
||||
---
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Why band-sorted order instead of arbitrary reordering
|
||||
|
||||
The leaf-first contract (`_get_child_summaries()`) is load-bearing.
|
||||
Breaking it silently degrades parent summaries because child cache
|
||||
entries don't exist yet. Band-sorting preserves leaf-first within each
|
||||
priority band, giving us "priority-first" without losing child context.
|
||||
|
||||
### Why per-directory allocation instead of a shared global pool
|
||||
|
||||
A shared pool with mid-loop borrowing requires the orchestrator to
|
||||
communicate with running agents, which doesn't exist in the current
|
||||
architecture (each `_run_dir_loop` call is independent). Per-directory
|
||||
allocation is a strict improvement over fixed-14-for-everyone with zero
|
||||
new machinery. The quality instrumentation will tell us if borrowing is
|
||||
worth building.
|
||||
|
||||
### Why the child-summaries placeholder was fixed
|
||||
|
||||
`_get_child_summaries()` previously returned "this is a leaf directory"
|
||||
for any directory with no cached children, whether it was actually a
|
||||
leaf or just hadn't been investigated yet. With priority-first ordering,
|
||||
this lie becomes more likely to trigger. The fix distinguishes the two
|
||||
cases: actual leaves get "this is a leaf directory", uninvestigated
|
||||
parents get "child directories exist but have not been investigated
|
||||
yet".
|
||||
|
||||
### Why completeness is a self-rating
|
||||
|
||||
An external completeness metric would require knowing "how many files
|
||||
should have been examined", which depends on the directory contents and
|
||||
is exactly the kind of judgment the agent makes. Self-rating is
|
||||
imperfect but cheap, and the correlation between self-rated
|
||||
completeness and turn utilization gives us a useful signal even if the
|
||||
absolute values aren't perfectly calibrated.
|
||||
|
||||
---
|
||||
|
||||
## Future Work
|
||||
|
||||
- **Mid-loop turn borrowing**: if utilization data shows priority dirs
|
||||
consistently hit their cap while others finish early, implement a
|
||||
shared budget pool.
|
||||
- **Plan refinement**: after the first dir loop run, re-evaluate the
|
||||
plan based on early findings (some "shallow" dirs might turn out to
|
||||
be important).
|
||||
- **Cross-run learning**: use `plan_evaluation.json` from prior runs to
|
||||
improve planning on similar targets.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Issues: #8, #9, #10, #11, #74
|
||||
- PR: #75
|
||||
- PLAN.md Part 4: Investigation Planning
|
||||
- [Internals](Internals) section 4.7: leaf-first contract
|
||||
Loading…
Reference in a new issue