wiki: Internals — reflect Phase 3 planning pass, (summary, completeness) return, cache layout

claude-code 2026-04-18 20:08:30 -06:00
parent 717cde8562
commit d3315b530f

@ -7,7 +7,8 @@ agent loop can finish this page and start making non-trivial changes.
All file:line references are accurate as of the date this page was last All file:line references are accurate as of the date this page was last
edited — verify with `git log` or by opening the file before relying on a edited — verify with `git log` or by opening the file before relying on a
specific line number. specific line number. `ai.py` in particular grows each phase and
references drift.
--- ---
@ -36,9 +37,9 @@ wait for a scan they can't use.
## 2. Base scan walkthrough ## 2. Base scan walkthrough
Entry: `luminos.py:main()` parses args, then calls `scan(target, ...)` at Entry: `luminos.py:main()` parses args, then calls `scan(target, ...)`.
`luminos.py:45`. `scan()` is a flat sequence — it builds a `report` dict `scan()` is a flat sequence — it builds a `report` dict by calling helpers
by calling helpers from `luminos_lib/`, one per concern, in order: from `luminos_lib/`, one per concern, in order:
``` ```
scan(target) scan(target)
@ -60,7 +61,7 @@ event-driven, and there is no shared state object — everything passes
through the local `report` dict. through the local `report` dict.
The progress lines you see on stderr (`[scan] Counting lines... foo.py`) The progress lines you see on stderr (`[scan] Counting lines... foo.py`)
come from `_progress()` in `luminos.py:23`, which returns an `on_file` come from `_progress()` in `luminos.py`, which returns an `on_file`
callback that the helpers call as they work. If you add a new helper that callback that the helpers call as they work. If you add a new helper that
walks files, plumb a progress callback through the same way for walks files, plumb a progress callback through the same way for
consistency. consistency.
@ -77,46 +78,54 @@ the base scan because it needs `report["survey_signals"]` and
The AI pipeline is what makes Luminos interesting and is also where The AI pipeline is what makes Luminos interesting and is also where
almost all the complexity lives. Everything below happens inside almost all the complexity lives. Everything below happens inside
`luminos_lib/ai.py` (1438 lines as of writing), called from `luminos_lib/ai.py` (~2060 lines as of writing), called from `luminos.py`
`luminos.py:157` via `analyze_directory()`. via `analyze_directory()`.
### 3.1 The orchestrator ### 3.1 The orchestrator
`analyze_directory()` (`ai.py:1408`) is a thin wrapper that checks `analyze_directory()` is a thin wrapper that checks dependencies, gets the
dependencies, gets the API key, builds the Anthropic client, and calls API key, builds the Anthropic client, and calls `_run_investigation()`.
`_run_investigation()`. If anything fails it prints a warning and returns If anything fails it prints a warning and returns empty strings — the
empty strings — the rest of luminos keeps working. rest of luminos keeps working.
`_run_investigation()` (`ai.py:1286`) is the real entry point. Read this `_run_investigation()` is the real entry point. Read this function first
function first if you want to understand the pipeline shape. It does six if you want to understand the pipeline shape. It does **seven** things,
things, in order: in order:
1. **Get/create an investigation ID and cache** (`ai.py:12891294`). 1. **Get/create an investigation ID and cache**. Investigation IDs let
Investigation IDs let you resume a previous run; see §5 below. you resume a previous run; see §5 below.
2. **Discover all directories** under the target via 2. **Discover all directories** under the target via
`_discover_directories()` (`ai.py:715`). Returns them sorted `_discover_directories()`. Returns them sorted *leaves-first* — the
*leaves-first* — the deepest paths come first. This matters because deepest paths come first. This matters because each dir loop reads
each dir loop reads its child directories' summaries from cache, so its child directories' summaries from cache, so children must be
children must be investigated before parents. investigated before parents.
3. **Run the survey pass** (`ai.py:13001334`) unless the target is below 3. **Run the survey pass** unless the target is below
the size thresholds at `ai.py:780781`, in which case `_SURVEY_MIN_FILES` and `_SURVEY_MIN_DIRS`, in which case
`_default_survey()` returns a synthetic skip. `_default_survey()` returns a synthetic skip.
4. **Filter out cached directories** (`ai.py:13361349`). If you're 4. **Filter out cached directories**. If you're resuming an
resuming an investigation, dirs that already have a `dir` cache entry investigation, dirs that already have a `dir` cache entry are
are skipped — only new ones get a fresh dir loop. skipped — only new ones get a fresh dir loop.
5. **Run a dir loop per remaining directory** (`ai.py:13511375`). This 5. **Run the planning pass** (Phase 3) unless the target is small, in
is the heart of the system — see §4. which case `_default_plan()` returns an empty plan. On resumed runs
6. **Run the synthesis pass** (`ai.py:1382`) reading only `dir` cache the planner is skipped and `plan.json` is loaded from cache instead.
entries to produce `(brief, detailed)`. `_apply_plan()` then sorts dirs into priority/default/shallow bands
and builds a `{dir_path: max_turns}` map. Leaf-first ordering is
preserved *within* each band (see §4.7).
6. **Run a dir loop per remaining directory**, iterating the
plan-ordered list with the per-directory `max_turns` from the plan.
`_write_plan_evaluation()` records turn-utilization metrics at the
end. This is the heart of the system — see §4.
7. **Run the synthesis pass** reading only `dir` cache entries to
produce `(brief, detailed)`.
It also reads `flags.jsonl` from disk at the end (`ai.py:13871397`) and It also reads `flags.jsonl` from disk at the end and returns
returns `(brief, detailed, flags)` to `analyze_directory()`. `(brief, detailed, flags)` to `analyze_directory()`.
### 3.2 The survey pass ### 3.2 The survey pass
`_run_survey()` (`ai.py:1051`) is a short, single-purpose loop. It exists `_run_survey()` is a short, single-purpose loop. It exists to give the
to give the dir loops some shared context about what they're looking at dir loops some shared context about what they're looking at *as a whole*
*as a whole* before any of them start. before any of them start.
Inputs go into the system prompt (`_SURVEY_SYSTEM_PROMPT` in Inputs go into the system prompt (`_SURVEY_SYSTEM_PROMPT` in
`prompts.py`): `prompts.py`):
@ -125,9 +134,9 @@ Inputs go into the system prompt (`_SURVEY_SYSTEM_PROMPT` in
- A 2-level tree preview from `build_tree(target, max_depth=2)` - A 2-level tree preview from `build_tree(target, max_depth=2)`
- The list of tools the dir loop will have available - The list of tools the dir loop will have available
The survey is allowed only `submit_survey` as a tool (`_SURVEY_TOOLS` at The survey is allowed only `submit_survey` as a tool (`_SURVEY_TOOLS`).
`ai.py:356`). It runs at most 3 turns. The agent must call `submit_survey` It runs at most 3 turns. The agent must call `submit_survey` exactly
exactly once with six fields: once with six fields:
```python ```python
{ {
@ -148,54 +157,82 @@ loops still run but with `survey=None` — the system degrades gracefully.
Two things happen with the survey output before each dir loop runs: Two things happen with the survey output before each dir loop runs:
**Survey block injection.** `_format_survey_block()` (`ai.py:803`) renders **Survey block injection.** `_format_survey_block()` renders the survey
the survey dict as a labeled text block, which gets `.format()`-injected dict as a labeled text block, which gets `.format()`-injected into the
into the dir loop system prompt as `{survey_context}`. The dir agent sees dir loop system prompt as `{survey_context}`. The dir agent sees the
the description, approach, domain notes, and which tools it should lean on description, approach, domain notes, and which tools it should lean on
or skip. or skip.
**Tool filtering.** `_filter_dir_tools()` (`ai.py:824`) returns a copy of **Tool filtering.** `_filter_dir_tools()` returns a copy of `_DIR_TOOLS`
`_DIR_TOOLS` with anything in `skip_tools` removed — but only if the with anything in `skip_tools` removed — but only if the survey's
survey's confidence is at or above `_SURVEY_CONFIDENCE_THRESHOLD = 0.5` confidence is at or above `_SURVEY_CONFIDENCE_THRESHOLD = 0.5`. Below
(`ai.py:775`). Below that threshold the agent gets the full toolbox. The that threshold the agent gets the full toolbox. The control-flow tool
control-flow tool `submit_report` is in `_PROTECTED_DIR_TOOLS` and can `submit_report` is in `_PROTECTED_DIR_TOOLS` and can never be filtered
never be filtered out — removing it would break loop termination. out — removing it would break loop termination.
This is the only place in the codebase where the agent's available tools This is the only place in the codebase where the agent's available
change at runtime. If you add a new tool, decide whether it should be tools change at runtime. If you add a new tool, decide whether it
protectable. should be protectable.
### 3.4 The planning pass (Phase 3)
`_run_planning()` is structured like `_run_survey()`: a single-purpose
loop with one submit tool (`submit_plan`), low max turns. Its job is to
decide *where* the dir loops should spend turns, not to investigate.
Inputs:
- The survey dict (formatted via `_format_survey_block()`)
- The full tree at depth 6 (deeper than the survey's 2-level preview)
- The base scan's `survey_signals` (raw file signals)
- The list of already-cached directories (so the planner doesn't plan
around dirs that will be skipped)
The plan schema, tier allocations (priority 1520 cap 25, default 10,
shallow 5, skip 0), fallback behavior, and resume behavior are covered
in full on the [Planning Pass](PlanningPass) page.
`_apply_plan()` is a pure helper that translates the plan into an
ordered list of directories plus a `{dir_path: max_turns}` map. It
sorts dirs into priority/default/shallow bands but **preserves
leaf-first ordering within each band** — so children always run before
their parents, even in "priority-first" mode. See §4.7.
`_write_plan_evaluation()` writes `plan_evaluation.json` at the end of
every run with `turns_allocated`, `turns_used`, and `completeness` per
directory. This is the planning pass's report card.
--- ---
## 4. The dir loop in depth ## 4. The dir loop in depth
`_run_dir_loop()` is at `ai.py:1017`. It is a hand-written agent loop, and `_run_dir_loop()` is a hand-written agent loop, and you should expect
you should expect to read it several times before it clicks. As of #57 the to read it several times before it clicks. As of #57 the loop body
loop body itself is a thin coordinator (~25 lines): it calls three helpers itself is a thin coordinator (~25 lines): it calls three helpers that
that own the layers it used to inline. own the layers it used to inline.
| Helper | Lines | Job | | Helper | Job |
|---|---|---| |---|---|
| `_build_dir_loop_context()` | `ai.py:855` | Pure setup. Builds dir context, child summaries, survey block, filtered tool list, system prompt, and the seed user message. Returns a `_DirLoopContext` namedtuple. | | `_build_dir_loop_context()` | Pure setup. Builds dir context, child summaries, survey block, filtered tool list, system prompt, and the seed user message. Returns a `_DirLoopContext` namedtuple. |
| `_flush_partial_dir_entry()` | `ai.py:896` | Idempotent partial-cache writer for the budget-exceeded path. Synthesizes a summary from already-cached file entries when possible, or writes a "no files processed" stub. Returns the partial summary string. | | `_flush_partial_dir_entry()` | Idempotent partial-cache writer for the budget-exceeded path. Synthesizes a summary from already-cached file entries when possible, or writes a "no files processed" stub. Returns the partial summary string. |
| `_handle_turn_response()` | `ai.py:957` | Per-turn response processing. Prints text blocks and tool decisions to stderr, appends the assistant message, dispatches tools (or nudges the agent to call submit_report), appends tool_results. Returns `(done, summary)`. | | `_handle_turn_response()` | Per-turn response processing. Prints text blocks and tool decisions to stderr, appends the assistant message, dispatches tools (or nudges the agent to call submit_report), appends tool_results. Returns `(done, summary, completeness)`. |
The shape of the loop body is now: The shape of the loop body is now:
``` ```
ctx = _build_dir_loop_context(...) ctx = _build_dir_loop_context(...)
reset per-loop token counter reset per-loop token counter
for turn in range(max_turns): # max_turns = 14 for turn in range(max_turns): # max_turns from plan (525)
if budget exceeded: if budget exceeded:
print warning print warning
partial = _flush_partial_dir_entry(...) partial = _flush_partial_dir_entry(...)
if partial: summary = partial if partial: summary = partial
break break
call API (streaming) call API (streaming)
done, turn_summary = _handle_turn_response(...) done, turn_summary, turn_completeness = _handle_turn_response(...)
if turn_summary: summary = turn_summary if turn_summary: summary = turn_summary
if turn_completeness: completeness = turn_completeness
if done: break if done: break
return summary return (summary, completeness)
``` ```
A few non-obvious mechanics: A few non-obvious mechanics:
@ -207,95 +244,104 @@ message (the tool results). Nothing is ever evicted. This means
`input_tokens` on each successive API call grows roughly linearly — the `input_tokens` on each successive API call grows roughly linearly — the
model is re-sent the full conversation every turn. On code targets we see model is re-sent the full conversation every turn. On code targets we see
~1.52k tokens added per turn. At `max_turns=14` this stays under the ~1.52k tokens added per turn. At `max_turns=14` this stays under the
budget; raising the cap would expose this. See **#51**. budget; raising the cap would expose this. With Phase 3's priority-tier
cap of 25, we're still well under budget in practice but closer to the
ceiling. See **#51**.
### 4.2 Tool dispatch ### 4.2 Tool dispatch
Tools are plain functions in `ai.py`. They are wired up via a single Tools are plain functions in `ai.py`. They are wired up via a single
`register_tool()` call (`ai.py:172`) that lands the schema in one or `register_tool()` call that lands the schema in one or more scope lists
more scope lists (`_DIR_TOOLS`, `_SYNTHESIS_TOOLS`, `_SURVEY_TOOLS`) (`_DIR_TOOLS`, `_SYNTHESIS_TOOLS`, `_SURVEY_TOOLS`, `_PLANNING_TOOLS`)
and the handler in `_TOOL_DISPATCH`. The registrations live below the and the handler in `_TOOL_DISPATCH`. The registrations live below the
tool implementations in `ai.py` and read top-to-bottom in dir-then- tool implementations in `ai.py` and read top-to-bottom in
synthesis-then-survey order. dir-then-synthesis-then-survey-then-planning order.
`_execute_tool()` looks up the handler by name in `_TOOL_DISPATCH`, `_execute_tool()` looks up the handler by name in `_TOOL_DISPATCH`,
calls it, logs the turn to `investigation.log`, and returns the result calls it, logs the turn to `investigation.log`, and returns the result
string. **Tools intercepted by the loop body — `submit_report` and string. **Tools intercepted by the loop body — `submit_report`,
`submit_survey` — register their schema only and have no handler entry.** `submit_survey`, `submit_plan` — register their schema only and have no
`_handle_turn_response()` recognizes `submit_report` specially: it sets handler entry.** `_handle_turn_response()` recognizes `submit_report`
`done = True` and extracts the summary directly from the tool input. specially: it sets `done = True`, extracts the summary from the tool
input, and also extracts the optional `completeness` field (Phase 3
instrumentation).
`think`, `checkpoint`, and `flag` *are* in dispatch, but they have side `think`, `checkpoint`, and `flag` *are* in dispatch, but they have side
effects that just print to stderr or append to `flags.jsonl` — the return effects that just print to stderr or append to `flags.jsonl` — the
value is always `"ok"`. return value is always `"ok"`.
When you add a tool: write the function, then add one `register_tool()` When you add a tool: write the function, then add one `register_tool()`
call below it. That's it. There is no second place to forget. call below it. That's it. There is no second place to forget.
### 4.3 Pre-loaded context ### 4.3 Pre-loaded context
Before the loop starts, `_build_dir_loop_context()` (`ai.py:855`) calls Before the loop starts, `_build_dir_loop_context()` calls two helpers
two helpers that prepare static context for the system prompt: that prepare static context for the system prompt:
- `_build_dir_context()` (`ai.py:741`) — `ls`-style listing of the dir - `_build_dir_context()``ls`-style listing of the dir with sizes and
with sizes and MIME types via `python-magic`. The agent sees this MIME types via `python-magic`. The agent sees this *before* it makes
*before* it makes any tool calls, so it doesn't waste a turn just any tool calls, so it doesn't waste a turn just listing the directory.
listing the directory. - `_get_child_summaries()` — looks up each subdirectory in the cache and
- `_get_child_summaries()` (`ai.py:763`) — looks up each subdirectory in pulls its `summary` field. This is how leaves-first ordering pays off:
the cache and pulls its `summary` field. This is how leaves-first by the time the loop runs on `src/`, all of `src/auth/`, `src/db/`,
ordering pays off: by the time the loop runs on `src/`, all of `src/middleware/` already have cached summaries that get injected as
`src/auth/`, `src/db/`, `src/middleware/` already have cached summaries `{child_summaries}`.
that get injected as `{child_summaries}`.
If `_get_child_summaries()` returns nothing, the prompt says If `_get_child_summaries()` returns nothing, the prompt distinguishes
`(none — this is a leaf directory)`. leaf directories (`"(none: this is a leaf directory)"`) from parents
whose children haven't been investigated yet (`"(child directories
exist but have not been investigated yet)"`). See §4.7.
### 4.4 The token tracker and the budget check ### 4.4 The token tracker and the budget check
`_TokenTracker` (`ai.py:94`) is a tiny accumulator with one important `_TokenTracker` is a tiny accumulator with one important subtlety,
subtlety, captured in **#44**: captured in **#44**:
> Cumulative input tokens are NOT a meaningful proxy for context size: > Cumulative input tokens are NOT a meaningful proxy for context size:
> each turn's `input_tokens` already includes the full message history, > each turn's `input_tokens` already includes the full message history,
> so summing across turns double-counts everything. Use `last_input` for > so summing across turns double-counts everything. Use `last_input` for
> budget decisions, totals for billing. > budget decisions, totals for billing.
So `budget_exceeded()` (`ai.py:135`) compares `last_input` (the most So `budget_exceeded()` compares `last_input` (the most recent call's
recent call's input_tokens) to `CONTEXT_BUDGET` (`ai.py:40`), which is input_tokens) to `CONTEXT_BUDGET`, which is 70% of 200k. This is
70% of 200k. This is checked at the *top* of each loop iteration, before checked at the *top* of each loop iteration, before the next API call.
the next API call.
When the budget check trips, the loop: When the budget check trips, the loop:
1. Prints a `Context budget reached` warning to stderr 1. Prints a `Context budget reached` warning to stderr
2. Calls `_flush_partial_dir_entry()` (`ai.py:896`), which writes a 2. Calls `_flush_partial_dir_entry()`, which writes a partial dir cache
partial dir cache entry from any `file` cache entries the agent entry from any `file` cache entries the agent already produced,
already produced, marked with `partial: True` and `partial_reason`. marked with `partial: True` and `partial_reason`. The helper is
The helper is idempotent — if a dir entry already exists, it returns idempotent — if a dir entry already exists, it returns `""` without
`""` without writing. writing.
3. Breaks out of the loop 3. Breaks out of the loop
This means a budget breach doesn't lose work — anything the agent already This means a budget breach doesn't lose work — anything the agent
cached survives, and the synthesis pass will see a partial dir summary already cached survives, and the synthesis pass will see a partial dir
rather than nothing. summary rather than nothing.
### 4.5 What the loop returns ### 4.5 What the loop returns
`_run_dir_loop()` returns the `summary` string from `submit_report` (or `_run_dir_loop()` returns `(summary, completeness)`. The summary is the
the partial summary returned by `_flush_partial_dir_entry()` if the string from `submit_report` (or the partial summary returned by
budget tripped). `_run_investigation()` then writes a normal `dir` cache `_flush_partial_dir_entry()` if the budget tripped). The completeness
entry from this summary, *unless* the dir loop already wrote one itself is the agent's self-rated investigation thoroughness (0.01.0) — Phase
via the partial-flush path, in which case the `cache.has_entry("dir", 3 instrumentation used in `plan_evaluation.json` — or `None` if the
dir_path)` check skips it. agent didn't report one.
`_run_investigation()` writes a normal `dir` cache entry from this
summary (with `completeness` included if non-None), *unless* the dir
loop already wrote one itself via the partial-flush path, in which case
the `cache.has_entry("dir", dir_path)` check skips it.
### 4.6 The streaming API caller ### 4.6 The streaming API caller
`_call_api_streaming()` (`ai.py:686`) is a thin wrapper around `_call_api_streaming()` is a thin wrapper around
`client.messages.stream()`. It currently doesn't print tokens as they `client.messages.stream()`. It currently doesn't print tokens as they
arrive — it iterates the stream, drops everything, then pulls the final arrive — it iterates the stream, drops everything, then pulls the final
message via `stream.get_final_message()`. The streaming API is used for message via `stream.get_final_message()`. The streaming API is used for
real-time tool decision printing, which today happens only after the full real-time tool decision printing, which today happens only after the
response arrives. There's room here to add live progress printing if you full response arrives. There's room here to add live progress printing
want it. if you want it.
### 4.7 The leaf-first contract (load-bearing for child summaries) ### 4.7 The leaf-first contract (load-bearing for child summaries)
@ -339,32 +385,34 @@ the full design.
## 5. The cache model ## 5. The cache model
Cache lives at `/tmp/luminos/{investigation_id}/`. Code is Cache lives at `/tmp/luminos/{investigation_id}/`. Code is
`luminos_lib/cache.py` (201 lines). `luminos_lib/cache.py`.
### 5.1 Investigation IDs ### 5.1 Investigation IDs
`/tmp/luminos/investigations.json` maps absolute target paths to UUIDs. `/tmp/luminos/investigations.json` maps absolute target paths to UUIDs.
`_get_investigation_id()` (`cache.py:40`) looks up the target and either `_get_investigation_id()` looks up the target and either returns the
returns the existing UUID (resume) or creates a new one (fresh run). existing UUID (resume) or creates a new one (fresh run). `--fresh`
`--fresh` forces a new UUID even if one exists. forces a new UUID even if one exists.
### 5.2 What's stored ### 5.2 What's stored
Inside `/tmp/luminos/{uuid}/`: Inside `/tmp/luminos/{uuid}/`:
``` ```
meta.json investigation metadata (model, start time, dir count) meta.json investigation metadata (model, start time, dir count)
files/<sha256>.json one file per cached file entry plan.json planning pass output — cached for resumed runs
dirs/<sha256>.json one file per cached directory entry plan_evaluation.json post-investigation quality report (Phase 3)
flags.jsonl JSONL — appended on every flag tool call files/<sha256>.json one file per cached file entry
investigation.log JSONL — appended on every tool call dirs/<sha256>.json one file per cached directory entry
flags.jsonl JSONL — appended on every flag tool call
investigation.log JSONL — appended on every tool call
``` ```
**File and dir cache entries are NOT in JSONL** — they are one **File and dir cache entries are NOT in JSONL** — they are one
sha256-keyed JSON file per entry. The sha256 is over the path string sha256-keyed JSON file per entry. The sha256 is over the path string.
(`cache.py:13`). Only `flags.jsonl` and `investigation.log` use JSONL. Only `flags.jsonl` and `investigation.log` use JSONL.
Required fields are validated in `write_entry()` (`cache.py:115`): Required fields are validated in `write_entry()`:
```python ```python
file: {path, relative_path, size_bytes, category, summary, cached_at} file: {path, relative_path, size_bytes, category, summary, cached_at}
@ -376,31 +424,45 @@ The validator also rejects entries containing `content`, `contents`, or
contents, summaries only. If you change the schema, update the required contents, summaries only. If you change the schema, update the required
set in `write_entry()` and update the test in `tests/test_cache.py`. set in `write_entry()` and update the test in `tests/test_cache.py`.
### 5.3 Confidence support already exists ### 5.3 Confidence + completeness support
`write_entry()` validates an optional `confidence` field `write_entry()` validates optional `confidence` and `confidence_reason`
(`cache.py:129134`) and a `confidence_reason` string. fields (Phase 1) and an optional `completeness` field (Phase 3,
`low_confidence_entries(threshold=0.7)` (`cache.py:191`) returns all 0.01.0, the dir agent's self-rated thoroughness).
entries below a threshold, sorted ascending. The agent doesn't currently `low_confidence_entries(threshold=0.7)` returns all entries below a
*set* these fields in any prompt — that lights up when Phase 1 work threshold, sorted ascending — future refinement-pass fuel.
actually wires the prompts.
### 5.4 Why one-file-per-entry instead of JSONL ### 5.4 Why one-file-per-entry instead of JSONL
Random access by path. The dir loop calls `cache.has_entry("dir", path)` Random access by path. The dir loop calls
once per directory during the `_get_child_summaries()` lookup; with `cache.has_entry("dir", path)` once per directory during the
sha256-keyed files this is an `os.path.exists()` call. With JSONL it `_get_child_summaries()` lookup; with sha256-keyed files this is an
would be a full file scan. `os.path.exists()` call. With JSONL it would be a full file scan.
### 5.5 The planning files
`plan.json` is written by `_run_investigation()` after a successful
planning pass, so resumed runs can skip the planner. It is loaded
before the dir loops run when `--fresh` is not set and the file
exists.
`plan_evaluation.json` is written by `_write_plan_evaluation()` after
the dir loops finish. Schema: `plan_order`, `total_dirs_investigated`,
`total_turns_allocated`, `total_turns_used`, `overall_utilization`,
`per_directory` (list of `{dir, planned_tier, turns_allocated,
turns_used, utilization, completeness, confidence}`), `evaluated_at`.
See [Planning Pass](PlanningPass) for how to use it.
--- ---
## 6. Prompts ## 6. Prompts
All prompt templates live in `luminos_lib/prompts.py`. There are three: All prompt templates live in `luminos_lib/prompts.py`. There are four:
| Constant | Used by | What it carries | | Constant | Used by | What it carries |
|---|---|---| |---|---|---|
| `_SURVEY_SYSTEM_PROMPT` | `_run_survey` | survey_signals, tree_preview, available_tools | | `_SURVEY_SYSTEM_PROMPT` | `_run_survey` | survey_signals, tree_preview, available_tools |
| `_PLANNING_SYSTEM_PROMPT` | `_run_planning` | survey, tree, file signals, cached_dirs |
| `_DIR_SYSTEM_PROMPT` | `_run_dir_loop` | dir_path, dir_rel, max_turns, context, child_summaries, survey_context | | `_DIR_SYSTEM_PROMPT` | `_run_dir_loop` | dir_path, dir_rel, max_turns, context, child_summaries, survey_context |
| `_SYNTHESIS_SYSTEM_PROMPT` | `_run_synthesis` | target, summaries_text | | `_SYNTHESIS_SYSTEM_PROMPT` | `_run_synthesis` | target, summaries_text |
@ -424,8 +486,8 @@ that reason.
## 7. Synthesis pass ## 7. Synthesis pass
`_run_synthesis()` (`ai.py:1157`) is structurally similar to the dir loop `_run_synthesis()` is structurally similar to the dir loop but much
but much simpler: simpler:
- Reads all `dir` cache entries via `cache.read_all_entries("dir")` - Reads all `dir` cache entries via `cache.read_all_entries("dir")`
- Renders them into a `summaries_text` block (one section per dir) - Renders them into a `summaries_text` block (one section per dir)
@ -434,31 +496,29 @@ but much simpler:
`detailed` fields `detailed` fields
Tools available: `read_cache`, `list_cache`, `flag`, `submit_report` Tools available: `read_cache`, `list_cache`, `flag`, `submit_report`
(`_SYNTHESIS_TOOLS` at `ai.py:401`). The synthesis agent can pull (`_SYNTHESIS_TOOLS`). The synthesis agent can pull specific cache
specific cache entries back if it needs to drill in, but it cannot read entries back if it needs to drill in, but it cannot read files directly
files directly — synthesis is meant to operate on summaries, not raw — synthesis is meant to operate on summaries, not raw contents.
contents.
There's a fallback: if synthesis runs out of turns without calling There's a fallback: if synthesis runs out of turns without calling
`submit_report`, `_synthesize_from_cache()` (`ai.py:1262`) builds a `submit_report`, `_synthesize_from_cache()` builds a mechanical
mechanical brief+detailed from the cached dir summaries with no AI call. brief+detailed from the cached dir summaries with no AI call. This
This guarantees you always get *something* in the report. guarantees you always get *something* in the report.
--- ---
## 8. Flags ## 8. Flags
The `flag` tool is the agent's pressure valve for "I noticed something The `flag` tool is the agent's pressure valve for "I noticed something
that should not be lost in the summary." `_tool_flag()` (`ai.py:629`) that should not be lost in the summary." `_tool_flag()` prints to stderr
prints to stderr *and* appends a JSONL line to *and* appends a JSONL line to `{cache.root}/flags.jsonl`. At the end of
`{cache.root}/flags.jsonl`. At the end of `_run_investigation()` `_run_investigation()`, the orchestrator reads that file back and
(`ai.py:13871397`), the orchestrator reads that file back and includes includes the flags in its return tuple. `format_report()` then renders
the flags in its return tuple. `format_report()` then renders them in a them in a dedicated section.
dedicated section.
Severity is `info | concern | critical`. The agent is told to flag Severity is `info | concern | critical`. The agent is told to flag
*immediately* on discovery, not save findings for the report — this is in *immediately* on discovery, not save findings for the report — this is
the tool description at `ai.py:312`. in the tool description.
--- ---
@ -484,10 +544,11 @@ A cookbook for the kinds of changes that come up most often.
contains your handler and `_DIR_TOOLS` contains your schema after contains your handler and `_DIR_TOOLS` contains your schema after
importing `luminos_lib.ai`. importing `luminos_lib.ai`.
To make a tool available in synthesis or survey instead of (or in To make a tool available in synthesis, survey, or planning instead of
addition to) dir, pass `scopes=["synthesis"]`, `scopes=["survey"]`, or (or in addition to) dir, pass `scopes=["synthesis"]`, `scopes=["survey"]`,
`scopes=["dir", "synthesis"]`. Tools whose schema differs by scope (like `scopes=["planning"]`, or any combination. Tools whose schema differs by
`submit_report`) get a separate `register_tool()` call per scope. scope (like `submit_report`) get a separate `register_tool()` call per
scope.
### 9.2 Add a whole new pass ### 9.2 Add a whole new pass
@ -522,8 +583,7 @@ unless you `--fresh`.
### 9.4 Change cache schema ### 9.4 Change cache schema
1. Update the required-fields set in `cache.py:write_entry()` 1. Update the required-fields set in `cache.py:write_entry()`
(`cache.py:119123`) 2. Update `_DIR_TOOLS`'s `write_cache` description in `ai.py` so the
2. Update `_DIR_TOOLS`'s `write_cache` description in `ai.py:228` so the
agent knows what to write agent knows what to write
3. Update `_DIR_SYSTEM_PROMPT` in `prompts.py` if the agent needs to know 3. Update `_DIR_SYSTEM_PROMPT` in `prompts.py` if the agent needs to know
*how* to populate the new field *how* to populate the new field
@ -532,26 +592,25 @@ unless you `--fresh`.
### 9.5 Add a CLI flag ### 9.5 Add a CLI flag
Edit `luminos.py:88` (`main()`'s argparse setup) to define the flag, then Edit `luminos.py:main()`'s argparse setup to define the flag, then
plumb it through whatever functions need it. New AI-related flags plumb it through whatever functions need it. New AI-related flags
typically need to be added to `analyze_directory()`'s signature typically need to be added to `analyze_directory()`'s signature and
(`ai.py:1408`) and then forwarded to `_run_investigation()`. then forwarded to `_run_investigation()`.
--- ---
## 10. Token budget and cost ## 10. Token budget and cost
Budget logic is in `_TokenTracker.budget_exceeded()` and is checked at the Budget logic is in `_TokenTracker.budget_exceeded()` and is checked at
top of every dir loop iteration (`ai.py:882`). The budget is **per call**, the top of every dir loop iteration. The budget is **per call**, not
not cumulative — see §4.4. The breach handler flushes a partial dir cache cumulative — see §4.4. The breach handler flushes a partial dir cache
entry so work isn't lost. entry so work isn't lost.
Cost reporting happens once at the end of `_run_investigation()` Cost reporting happens once at the end of `_run_investigation()`, using
(`ai.py:1399`), using the cumulative `total_input` and `total_output` the cumulative `total_input` and `total_output` counters multiplied by
counters multiplied by the constants at `ai.py:4344`. There is no the constants near the top of `ai.py`. There is no running cost display
running cost display during the investigation today. If you want one, during the investigation today. If you want one, `_TokenTracker.summary()`
`_TokenTracker.summary()` already returns the formatted string — just already returns the formatted string — just call it after each dir loop.
call it after each dir loop.
--- ---
@ -560,16 +619,20 @@ call it after each dir loop.
| Term | Meaning | | Term | Meaning |
|---|---| |---|---|
| **base scan** | The non-AI phase: tree, classification, languages, recency, disk usage. Stdlib + coreutils only. | | **base scan** | The non-AI phase: tree, classification, languages, recency, disk usage. Stdlib + coreutils only. |
| **dir loop** | Per-directory agent loop in `_run_dir_loop`. Up to 14 turns. Produces a `dir` cache entry. | | **dir loop** | Per-directory agent loop in `_run_dir_loop`. Turns allocated by the planning pass (5 shallow / 10 default / 1520 priority, capped at 25). Produces a `dir` cache entry. |
| **survey pass** | Single short loop before any dir loops, producing a shared description and tool guidance. | | **survey pass** | Single short loop before any dir loops, producing a shared description and tool guidance. |
| **planning pass** | Phase 3 pass after the survey, before dir loops. Produces a plan (priority/shallow/skip dirs + turn allocations + order). |
| **synthesis pass** | Final loop that reads `dir` cache entries and produces `(brief, detailed)`. | | **synthesis pass** | Final loop that reads `dir` cache entries and produces `(brief, detailed)`. |
| **leaves-first** | Discovery order in `_discover_directories`: deepest paths first, so child summaries exist when parents are investigated. | | **leaves-first** | Discovery order in `_discover_directories`: deepest paths first, so child summaries exist when parents are investigated. Preserved within planning bands by `_apply_plan`. |
| **investigation** | One end-to-end run, identified by a UUID, persisted under `/tmp/luminos/{uuid}/`. | | **investigation** | One end-to-end run, identified by a UUID, persisted under `/tmp/luminos/{uuid}/`. |
| **investigation_id** | The UUID. Stored in `/tmp/luminos/investigations.json` keyed by absolute target path. | | **investigation_id** | The UUID. Stored in `/tmp/luminos/investigations.json` keyed by absolute target path. |
| **cache entry** | A JSON file under `files/` or `dirs/` named by sha256(path). | | **cache entry** | A JSON file under `files/` or `dirs/` named by sha256(path). |
| **flag** | An agent finding written to `flags.jsonl` and reported separately. info / concern / critical. | | **flag** | An agent finding written to `flags.jsonl` and reported separately. info / concern / critical. |
| **partial entry** | A `dir` cache entry written when the budget tripped before `submit_report`. Marked with `partial: True`. | | **partial entry** | A `dir` cache entry written when the budget tripped before `submit_report`. Marked with `partial: True`. |
| **completeness** | Phase 3 agent self-rated thoroughness (0.01.0) from `submit_report`. Feeds `plan_evaluation.json`. |
| **survey signals** | The histogram + samples computed by `filetypes.survey_signals()` during the base scan, fed to the survey prompt. | | **survey signals** | The histogram + samples computed by `filetypes.survey_signals()` during the base scan, fed to the survey prompt. |
| **last_input** | The `input_tokens` count from the most recent API call. The basis for budget checks. NOT the cumulative sum. | | **last_input** | The `input_tokens` count from the most recent API call. The basis for budget checks. NOT the cumulative sum. |
| **CONTEXT_BUDGET** | 70% of 200k = 140k. Trigger threshold for early exit. | | **CONTEXT_BUDGET** | 70% of 200k = 140k. Trigger threshold for early exit. |
| **`_PROTECTED_DIR_TOOLS`** | Tools the survey is forbidden from filtering out of the dir loop's toolbox. Currently `{submit_report}`. | | **`_PROTECTED_DIR_TOOLS`** | Tools the survey is forbidden from filtering out of the dir loop's toolbox. Currently `{submit_report}`. |
| **plan.json** | Serialized planning output, cached so resumed runs skip the planner. |
| **plan_evaluation.json** | Post-investigation quality report comparing plan predictions to outcomes. |