feat(ai): Phase 3 investigation planning #75

Merged

claude-code merged 1 commit from feat/phase-3-investigation-planning into main

2026-04-12 20:26:22 -06:00

claude-code commented

2026-04-12 20:22:26 -06:00

Collaborator

Summary

Implements Phase 3: Investigation Planning (#8, #9, #10, #11) plus quality instrumentation (#74).

Planning pass runs after survey, before dir loops. A single-turn Claude call classifies directories into priority/shallow/skip tiers and allocates turns per directory, replacing the fixed max_turns=14 with dynamic allocation from a global budget.
_apply_plan() sorts directories into priority bands while preserving the leaf-first invariant within each band, so child summaries are always available when parents are investigated.
Plan caching persists as plan.json so resumed runs follow the same strategy.
Quality instrumentation: turn utilization tracking per directory, completeness self-rating (0.0-1.0) on dir-scope submit_report, and plan_evaluation.json comparing plan predictions to actual outcomes.
Fix: _get_child_summaries() now distinguishes actual leaf directories from parents whose children haven't been investigated yet (previously a misleading placeholder).

Changes

File	What
`luminos_lib/prompts.py`	New `_PLANNING_SYSTEM_PROMPT`
`luminos_lib/ai.py`	`_run_planning()`, `submit_plan` tool, `_apply_plan()`, `_default_plan()`, `_write_plan_evaluation()`, dynamic turn allocation in orchestrator, `_TokenTracker._loop_turns`, `completeness` field, fixed `_get_child_summaries()`
`tests/test_ai_pure.py`	26 new tests (260 total)

Closes

Closes #8, closes #9, closes #10, closes #11, closes #74

Test plan

All 260 tests pass (python3 -m unittest discover -s tests/)
_apply_plan() tested: None plan, default plan, skip removal, priority turns, ceiling cap, shallow turns, priority-first ordering, leaf-first preservation within bands, unknown paths ignored, subset filtering
_get_child_summaries() tested: leaf dirs, uninvestigated children, cached children, hidden dirs
_write_plan_evaluation() tested: normal case, None plan, empty utilization, zero-allocation edge case
_TokenTracker._loop_turns tested: increment, reset, independence from totals
Planning tool registry tested: registration, required fields, order enum
Live run on a real target (requires API key, manual verification)

## Summary Implements Phase 3: Investigation Planning (#8, #9, #10, #11) plus quality instrumentation (#74). - **Planning pass** runs after survey, before dir loops. A single-turn Claude call classifies directories into priority/shallow/skip tiers and allocates turns per directory, replacing the fixed `max_turns=14` with dynamic allocation from a global budget. - **`_apply_plan()`** sorts directories into priority bands while preserving the leaf-first invariant within each band, so child summaries are always available when parents are investigated. - **Plan caching** persists as `plan.json` so resumed runs follow the same strategy. - **Quality instrumentation**: turn utilization tracking per directory, `completeness` self-rating (0.0-1.0) on dir-scope `submit_report`, and `plan_evaluation.json` comparing plan predictions to actual outcomes. - **Fix**: `_get_child_summaries()` now distinguishes actual leaf directories from parents whose children haven't been investigated yet (previously a misleading placeholder). ## Changes | File | What | |---|---| | `luminos_lib/prompts.py` | New `_PLANNING_SYSTEM_PROMPT` | | `luminos_lib/ai.py` | `_run_planning()`, `submit_plan` tool, `_apply_plan()`, `_default_plan()`, `_write_plan_evaluation()`, dynamic turn allocation in orchestrator, `_TokenTracker._loop_turns`, `completeness` field, fixed `_get_child_summaries()` | | `tests/test_ai_pure.py` | 26 new tests (260 total) | ## Closes Closes #8, closes #9, closes #10, closes #11, closes #74 ## Test plan - [x] All 260 tests pass (`python3 -m unittest discover -s tests/`) - [x] `_apply_plan()` tested: None plan, default plan, skip removal, priority turns, ceiling cap, shallow turns, priority-first ordering, leaf-first preservation within bands, unknown paths ignored, subset filtering - [x] `_get_child_summaries()` tested: leaf dirs, uninvestigated children, cached children, hidden dirs - [x] `_write_plan_evaluation()` tested: normal case, None plan, empty utilization, zero-allocation edge case - [x] `_TokenTracker._loop_turns` tested: increment, reset, independence from totals - [x] Planning tool registry tested: registration, required fields, order enum - [ ] Live run on a real target (requires API key, manual verification)

claude-code added 1 commit 2026-04-12 20:22:26 -06:00

feat(ai): implement Phase 3 investigation planning (#8 , #9 , #10 , #11 , #74 ) 2adbed9d28

Add a planning pass that runs after survey and before dir loops. The
planner classifies directories into priority/shallow/skip tiers and
allocates turns accordingly, replacing the fixed max_turns=14 per
directory with dynamic allocation from a global budget.

Planning pass:
- _PLANNING_SYSTEM_PROMPT in prompts.py with submit_plan tool
- _run_planning() follows the same single-turn pattern as _run_survey()
- submit_plan tool registered in new "planning" scope
- _apply_plan() pure function: band-sorted ordering (leaf-first within
  bands), turn map, skip-dir removal
- _default_plan() fallback when planning is skipped or fails
- Plan cached as plan.json for resumed runs

Dynamic turn allocation:
- Priority dirs: 15-20 turns (capped at 25)
- Shallow dirs: 5 turns
- Default: 10 turns
- Skip dirs: excluded entirely
- Orchestrator passes per-dir max_turns to _run_dir_loop()

Quality instrumentation:
- _TokenTracker._loop_turns counts API calls per dir loop
- completeness field (0.0-1.0) added to dir-scope submit_report
- plan_evaluation.json emitted after dir loops comparing plan predictions
  to actual turn utilization, completeness, and confidence
- Turn utilization logged per directory during investigation

Also fixes _get_child_summaries() to distinguish actual leaf directories
from parents whose children have not been investigated yet, replacing
the misleading "this is a leaf directory" placeholder.

26 new tests (260 total, all passing).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>