wiki: Internals — reflect Phase 3 planning pass, (summary, completeness) return, cache layout
parent
717cde8562
commit
d3315b530f
1 changed files with 234 additions and 171 deletions
405
Internals.md
405
Internals.md
|
|
@ -7,7 +7,8 @@ agent loop can finish this page and start making non-trivial changes.
|
|||
|
||||
All file:line references are accurate as of the date this page was last
|
||||
edited — verify with `git log` or by opening the file before relying on a
|
||||
specific line number.
|
||||
specific line number. `ai.py` in particular grows each phase and
|
||||
references drift.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -36,9 +37,9 @@ wait for a scan they can't use.
|
|||
|
||||
## 2. Base scan walkthrough
|
||||
|
||||
Entry: `luminos.py:main()` parses args, then calls `scan(target, ...)` at
|
||||
`luminos.py:45`. `scan()` is a flat sequence — it builds a `report` dict
|
||||
by calling helpers from `luminos_lib/`, one per concern, in order:
|
||||
Entry: `luminos.py:main()` parses args, then calls `scan(target, ...)`.
|
||||
`scan()` is a flat sequence — it builds a `report` dict by calling helpers
|
||||
from `luminos_lib/`, one per concern, in order:
|
||||
|
||||
```
|
||||
scan(target)
|
||||
|
|
@ -60,7 +61,7 @@ event-driven, and there is no shared state object — everything passes
|
|||
through the local `report` dict.
|
||||
|
||||
The progress lines you see on stderr (`[scan] Counting lines... foo.py`)
|
||||
come from `_progress()` in `luminos.py:23`, which returns an `on_file`
|
||||
come from `_progress()` in `luminos.py`, which returns an `on_file`
|
||||
callback that the helpers call as they work. If you add a new helper that
|
||||
walks files, plumb a progress callback through the same way for
|
||||
consistency.
|
||||
|
|
@ -77,46 +78,54 @@ the base scan because it needs `report["survey_signals"]` and
|
|||
|
||||
The AI pipeline is what makes Luminos interesting and is also where
|
||||
almost all the complexity lives. Everything below happens inside
|
||||
`luminos_lib/ai.py` (1438 lines as of writing), called from
|
||||
`luminos.py:157` via `analyze_directory()`.
|
||||
`luminos_lib/ai.py` (~2060 lines as of writing), called from `luminos.py`
|
||||
via `analyze_directory()`.
|
||||
|
||||
### 3.1 The orchestrator
|
||||
|
||||
`analyze_directory()` (`ai.py:1408`) is a thin wrapper that checks
|
||||
dependencies, gets the API key, builds the Anthropic client, and calls
|
||||
`_run_investigation()`. If anything fails it prints a warning and returns
|
||||
empty strings — the rest of luminos keeps working.
|
||||
`analyze_directory()` is a thin wrapper that checks dependencies, gets the
|
||||
API key, builds the Anthropic client, and calls `_run_investigation()`.
|
||||
If anything fails it prints a warning and returns empty strings — the
|
||||
rest of luminos keeps working.
|
||||
|
||||
`_run_investigation()` (`ai.py:1286`) is the real entry point. Read this
|
||||
function first if you want to understand the pipeline shape. It does six
|
||||
things, in order:
|
||||
`_run_investigation()` is the real entry point. Read this function first
|
||||
if you want to understand the pipeline shape. It does **seven** things,
|
||||
in order:
|
||||
|
||||
1. **Get/create an investigation ID and cache** (`ai.py:1289–1294`).
|
||||
Investigation IDs let you resume a previous run; see §5 below.
|
||||
1. **Get/create an investigation ID and cache**. Investigation IDs let
|
||||
you resume a previous run; see §5 below.
|
||||
2. **Discover all directories** under the target via
|
||||
`_discover_directories()` (`ai.py:715`). Returns them sorted
|
||||
*leaves-first* — the deepest paths come first. This matters because
|
||||
each dir loop reads its child directories' summaries from cache, so
|
||||
children must be investigated before parents.
|
||||
3. **Run the survey pass** (`ai.py:1300–1334`) unless the target is below
|
||||
the size thresholds at `ai.py:780–781`, in which case
|
||||
`_discover_directories()`. Returns them sorted *leaves-first* — the
|
||||
deepest paths come first. This matters because each dir loop reads
|
||||
its child directories' summaries from cache, so children must be
|
||||
investigated before parents.
|
||||
3. **Run the survey pass** unless the target is below
|
||||
`_SURVEY_MIN_FILES` and `_SURVEY_MIN_DIRS`, in which case
|
||||
`_default_survey()` returns a synthetic skip.
|
||||
4. **Filter out cached directories** (`ai.py:1336–1349`). If you're
|
||||
resuming an investigation, dirs that already have a `dir` cache entry
|
||||
are skipped — only new ones get a fresh dir loop.
|
||||
5. **Run a dir loop per remaining directory** (`ai.py:1351–1375`). This
|
||||
is the heart of the system — see §4.
|
||||
6. **Run the synthesis pass** (`ai.py:1382`) reading only `dir` cache
|
||||
entries to produce `(brief, detailed)`.
|
||||
4. **Filter out cached directories**. If you're resuming an
|
||||
investigation, dirs that already have a `dir` cache entry are
|
||||
skipped — only new ones get a fresh dir loop.
|
||||
5. **Run the planning pass** (Phase 3) unless the target is small, in
|
||||
which case `_default_plan()` returns an empty plan. On resumed runs
|
||||
the planner is skipped and `plan.json` is loaded from cache instead.
|
||||
`_apply_plan()` then sorts dirs into priority/default/shallow bands
|
||||
and builds a `{dir_path: max_turns}` map. Leaf-first ordering is
|
||||
preserved *within* each band (see §4.7).
|
||||
6. **Run a dir loop per remaining directory**, iterating the
|
||||
plan-ordered list with the per-directory `max_turns` from the plan.
|
||||
`_write_plan_evaluation()` records turn-utilization metrics at the
|
||||
end. This is the heart of the system — see §4.
|
||||
7. **Run the synthesis pass** reading only `dir` cache entries to
|
||||
produce `(brief, detailed)`.
|
||||
|
||||
It also reads `flags.jsonl` from disk at the end (`ai.py:1387–1397`) and
|
||||
returns `(brief, detailed, flags)` to `analyze_directory()`.
|
||||
It also reads `flags.jsonl` from disk at the end and returns
|
||||
`(brief, detailed, flags)` to `analyze_directory()`.
|
||||
|
||||
### 3.2 The survey pass
|
||||
|
||||
`_run_survey()` (`ai.py:1051`) is a short, single-purpose loop. It exists
|
||||
to give the dir loops some shared context about what they're looking at
|
||||
*as a whole* before any of them start.
|
||||
`_run_survey()` is a short, single-purpose loop. It exists to give the
|
||||
dir loops some shared context about what they're looking at *as a whole*
|
||||
before any of them start.
|
||||
|
||||
Inputs go into the system prompt (`_SURVEY_SYSTEM_PROMPT` in
|
||||
`prompts.py`):
|
||||
|
|
@ -125,9 +134,9 @@ Inputs go into the system prompt (`_SURVEY_SYSTEM_PROMPT` in
|
|||
- A 2-level tree preview from `build_tree(target, max_depth=2)`
|
||||
- The list of tools the dir loop will have available
|
||||
|
||||
The survey is allowed only `submit_survey` as a tool (`_SURVEY_TOOLS` at
|
||||
`ai.py:356`). It runs at most 3 turns. The agent must call `submit_survey`
|
||||
exactly once with six fields:
|
||||
The survey is allowed only `submit_survey` as a tool (`_SURVEY_TOOLS`).
|
||||
It runs at most 3 turns. The agent must call `submit_survey` exactly
|
||||
once with six fields:
|
||||
|
||||
```python
|
||||
{
|
||||
|
|
@ -148,54 +157,82 @@ loops still run but with `survey=None` — the system degrades gracefully.
|
|||
|
||||
Two things happen with the survey output before each dir loop runs:
|
||||
|
||||
**Survey block injection.** `_format_survey_block()` (`ai.py:803`) renders
|
||||
the survey dict as a labeled text block, which gets `.format()`-injected
|
||||
into the dir loop system prompt as `{survey_context}`. The dir agent sees
|
||||
the description, approach, domain notes, and which tools it should lean on
|
||||
**Survey block injection.** `_format_survey_block()` renders the survey
|
||||
dict as a labeled text block, which gets `.format()`-injected into the
|
||||
dir loop system prompt as `{survey_context}`. The dir agent sees the
|
||||
description, approach, domain notes, and which tools it should lean on
|
||||
or skip.
|
||||
|
||||
**Tool filtering.** `_filter_dir_tools()` (`ai.py:824`) returns a copy of
|
||||
`_DIR_TOOLS` with anything in `skip_tools` removed — but only if the
|
||||
survey's confidence is at or above `_SURVEY_CONFIDENCE_THRESHOLD = 0.5`
|
||||
(`ai.py:775`). Below that threshold the agent gets the full toolbox. The
|
||||
control-flow tool `submit_report` is in `_PROTECTED_DIR_TOOLS` and can
|
||||
never be filtered out — removing it would break loop termination.
|
||||
**Tool filtering.** `_filter_dir_tools()` returns a copy of `_DIR_TOOLS`
|
||||
with anything in `skip_tools` removed — but only if the survey's
|
||||
confidence is at or above `_SURVEY_CONFIDENCE_THRESHOLD = 0.5`. Below
|
||||
that threshold the agent gets the full toolbox. The control-flow tool
|
||||
`submit_report` is in `_PROTECTED_DIR_TOOLS` and can never be filtered
|
||||
out — removing it would break loop termination.
|
||||
|
||||
This is the only place in the codebase where the agent's available tools
|
||||
change at runtime. If you add a new tool, decide whether it should be
|
||||
protectable.
|
||||
This is the only place in the codebase where the agent's available
|
||||
tools change at runtime. If you add a new tool, decide whether it
|
||||
should be protectable.
|
||||
|
||||
### 3.4 The planning pass (Phase 3)
|
||||
|
||||
`_run_planning()` is structured like `_run_survey()`: a single-purpose
|
||||
loop with one submit tool (`submit_plan`), low max turns. Its job is to
|
||||
decide *where* the dir loops should spend turns, not to investigate.
|
||||
|
||||
Inputs:
|
||||
- The survey dict (formatted via `_format_survey_block()`)
|
||||
- The full tree at depth 6 (deeper than the survey's 2-level preview)
|
||||
- The base scan's `survey_signals` (raw file signals)
|
||||
- The list of already-cached directories (so the planner doesn't plan
|
||||
around dirs that will be skipped)
|
||||
|
||||
The plan schema, tier allocations (priority 15–20 cap 25, default 10,
|
||||
shallow 5, skip 0), fallback behavior, and resume behavior are covered
|
||||
in full on the [Planning Pass](PlanningPass) page.
|
||||
|
||||
`_apply_plan()` is a pure helper that translates the plan into an
|
||||
ordered list of directories plus a `{dir_path: max_turns}` map. It
|
||||
sorts dirs into priority/default/shallow bands but **preserves
|
||||
leaf-first ordering within each band** — so children always run before
|
||||
their parents, even in "priority-first" mode. See §4.7.
|
||||
|
||||
`_write_plan_evaluation()` writes `plan_evaluation.json` at the end of
|
||||
every run with `turns_allocated`, `turns_used`, and `completeness` per
|
||||
directory. This is the planning pass's report card.
|
||||
|
||||
---
|
||||
|
||||
## 4. The dir loop in depth
|
||||
|
||||
`_run_dir_loop()` is at `ai.py:1017`. It is a hand-written agent loop, and
|
||||
you should expect to read it several times before it clicks. As of #57 the
|
||||
loop body itself is a thin coordinator (~25 lines): it calls three helpers
|
||||
that own the layers it used to inline.
|
||||
`_run_dir_loop()` is a hand-written agent loop, and you should expect
|
||||
to read it several times before it clicks. As of #57 the loop body
|
||||
itself is a thin coordinator (~25 lines): it calls three helpers that
|
||||
own the layers it used to inline.
|
||||
|
||||
| Helper | Lines | Job |
|
||||
|---|---|---|
|
||||
| `_build_dir_loop_context()` | `ai.py:855` | Pure setup. Builds dir context, child summaries, survey block, filtered tool list, system prompt, and the seed user message. Returns a `_DirLoopContext` namedtuple. |
|
||||
| `_flush_partial_dir_entry()` | `ai.py:896` | Idempotent partial-cache writer for the budget-exceeded path. Synthesizes a summary from already-cached file entries when possible, or writes a "no files processed" stub. Returns the partial summary string. |
|
||||
| `_handle_turn_response()` | `ai.py:957` | Per-turn response processing. Prints text blocks and tool decisions to stderr, appends the assistant message, dispatches tools (or nudges the agent to call submit_report), appends tool_results. Returns `(done, summary)`. |
|
||||
| Helper | Job |
|
||||
|---|---|
|
||||
| `_build_dir_loop_context()` | Pure setup. Builds dir context, child summaries, survey block, filtered tool list, system prompt, and the seed user message. Returns a `_DirLoopContext` namedtuple. |
|
||||
| `_flush_partial_dir_entry()` | Idempotent partial-cache writer for the budget-exceeded path. Synthesizes a summary from already-cached file entries when possible, or writes a "no files processed" stub. Returns the partial summary string. |
|
||||
| `_handle_turn_response()` | Per-turn response processing. Prints text blocks and tool decisions to stderr, appends the assistant message, dispatches tools (or nudges the agent to call submit_report), appends tool_results. Returns `(done, summary, completeness)`. |
|
||||
|
||||
The shape of the loop body is now:
|
||||
|
||||
```
|
||||
ctx = _build_dir_loop_context(...)
|
||||
reset per-loop token counter
|
||||
for turn in range(max_turns): # max_turns = 14
|
||||
for turn in range(max_turns): # max_turns from plan (5–25)
|
||||
if budget exceeded:
|
||||
print warning
|
||||
partial = _flush_partial_dir_entry(...)
|
||||
if partial: summary = partial
|
||||
break
|
||||
call API (streaming)
|
||||
done, turn_summary = _handle_turn_response(...)
|
||||
done, turn_summary, turn_completeness = _handle_turn_response(...)
|
||||
if turn_summary: summary = turn_summary
|
||||
if turn_completeness: completeness = turn_completeness
|
||||
if done: break
|
||||
return summary
|
||||
return (summary, completeness)
|
||||
```
|
||||
|
||||
A few non-obvious mechanics:
|
||||
|
|
@ -207,95 +244,104 @@ message (the tool results). Nothing is ever evicted. This means
|
|||
`input_tokens` on each successive API call grows roughly linearly — the
|
||||
model is re-sent the full conversation every turn. On code targets we see
|
||||
~1.5–2k tokens added per turn. At `max_turns=14` this stays under the
|
||||
budget; raising the cap would expose this. See **#51**.
|
||||
budget; raising the cap would expose this. With Phase 3's priority-tier
|
||||
cap of 25, we're still well under budget in practice but closer to the
|
||||
ceiling. See **#51**.
|
||||
|
||||
### 4.2 Tool dispatch
|
||||
|
||||
Tools are plain functions in `ai.py`. They are wired up via a single
|
||||
`register_tool()` call (`ai.py:172`) that lands the schema in one or
|
||||
more scope lists (`_DIR_TOOLS`, `_SYNTHESIS_TOOLS`, `_SURVEY_TOOLS`)
|
||||
`register_tool()` call that lands the schema in one or more scope lists
|
||||
(`_DIR_TOOLS`, `_SYNTHESIS_TOOLS`, `_SURVEY_TOOLS`, `_PLANNING_TOOLS`)
|
||||
and the handler in `_TOOL_DISPATCH`. The registrations live below the
|
||||
tool implementations in `ai.py` and read top-to-bottom in dir-then-
|
||||
synthesis-then-survey order.
|
||||
tool implementations in `ai.py` and read top-to-bottom in
|
||||
dir-then-synthesis-then-survey-then-planning order.
|
||||
|
||||
`_execute_tool()` looks up the handler by name in `_TOOL_DISPATCH`,
|
||||
calls it, logs the turn to `investigation.log`, and returns the result
|
||||
string. **Tools intercepted by the loop body — `submit_report` and
|
||||
`submit_survey` — register their schema only and have no handler entry.**
|
||||
`_handle_turn_response()` recognizes `submit_report` specially: it sets
|
||||
`done = True` and extracts the summary directly from the tool input.
|
||||
string. **Tools intercepted by the loop body — `submit_report`,
|
||||
`submit_survey`, `submit_plan` — register their schema only and have no
|
||||
handler entry.** `_handle_turn_response()` recognizes `submit_report`
|
||||
specially: it sets `done = True`, extracts the summary from the tool
|
||||
input, and also extracts the optional `completeness` field (Phase 3
|
||||
instrumentation).
|
||||
|
||||
`think`, `checkpoint`, and `flag` *are* in dispatch, but they have side
|
||||
effects that just print to stderr or append to `flags.jsonl` — the return
|
||||
value is always `"ok"`.
|
||||
effects that just print to stderr or append to `flags.jsonl` — the
|
||||
return value is always `"ok"`.
|
||||
|
||||
When you add a tool: write the function, then add one `register_tool()`
|
||||
call below it. That's it. There is no second place to forget.
|
||||
|
||||
### 4.3 Pre-loaded context
|
||||
|
||||
Before the loop starts, `_build_dir_loop_context()` (`ai.py:855`) calls
|
||||
two helpers that prepare static context for the system prompt:
|
||||
Before the loop starts, `_build_dir_loop_context()` calls two helpers
|
||||
that prepare static context for the system prompt:
|
||||
|
||||
- `_build_dir_context()` (`ai.py:741`) — `ls`-style listing of the dir
|
||||
with sizes and MIME types via `python-magic`. The agent sees this
|
||||
*before* it makes any tool calls, so it doesn't waste a turn just
|
||||
listing the directory.
|
||||
- `_get_child_summaries()` (`ai.py:763`) — looks up each subdirectory in
|
||||
the cache and pulls its `summary` field. This is how leaves-first
|
||||
ordering pays off: by the time the loop runs on `src/`, all of
|
||||
`src/auth/`, `src/db/`, `src/middleware/` already have cached summaries
|
||||
that get injected as `{child_summaries}`.
|
||||
- `_build_dir_context()` — `ls`-style listing of the dir with sizes and
|
||||
MIME types via `python-magic`. The agent sees this *before* it makes
|
||||
any tool calls, so it doesn't waste a turn just listing the directory.
|
||||
- `_get_child_summaries()` — looks up each subdirectory in the cache and
|
||||
pulls its `summary` field. This is how leaves-first ordering pays off:
|
||||
by the time the loop runs on `src/`, all of `src/auth/`, `src/db/`,
|
||||
`src/middleware/` already have cached summaries that get injected as
|
||||
`{child_summaries}`.
|
||||
|
||||
If `_get_child_summaries()` returns nothing, the prompt says
|
||||
`(none — this is a leaf directory)`.
|
||||
If `_get_child_summaries()` returns nothing, the prompt distinguishes
|
||||
leaf directories (`"(none: this is a leaf directory)"`) from parents
|
||||
whose children haven't been investigated yet (`"(child directories
|
||||
exist but have not been investigated yet)"`). See §4.7.
|
||||
|
||||
### 4.4 The token tracker and the budget check
|
||||
|
||||
`_TokenTracker` (`ai.py:94`) is a tiny accumulator with one important
|
||||
subtlety, captured in **#44**:
|
||||
`_TokenTracker` is a tiny accumulator with one important subtlety,
|
||||
captured in **#44**:
|
||||
|
||||
> Cumulative input tokens are NOT a meaningful proxy for context size:
|
||||
> each turn's `input_tokens` already includes the full message history,
|
||||
> so summing across turns double-counts everything. Use `last_input` for
|
||||
> budget decisions, totals for billing.
|
||||
|
||||
So `budget_exceeded()` (`ai.py:135`) compares `last_input` (the most
|
||||
recent call's input_tokens) to `CONTEXT_BUDGET` (`ai.py:40`), which is
|
||||
70% of 200k. This is checked at the *top* of each loop iteration, before
|
||||
the next API call.
|
||||
So `budget_exceeded()` compares `last_input` (the most recent call's
|
||||
input_tokens) to `CONTEXT_BUDGET`, which is 70% of 200k. This is
|
||||
checked at the *top* of each loop iteration, before the next API call.
|
||||
|
||||
When the budget check trips, the loop:
|
||||
1. Prints a `Context budget reached` warning to stderr
|
||||
2. Calls `_flush_partial_dir_entry()` (`ai.py:896`), which writes a
|
||||
partial dir cache entry from any `file` cache entries the agent
|
||||
already produced, marked with `partial: True` and `partial_reason`.
|
||||
The helper is idempotent — if a dir entry already exists, it returns
|
||||
`""` without writing.
|
||||
2. Calls `_flush_partial_dir_entry()`, which writes a partial dir cache
|
||||
entry from any `file` cache entries the agent already produced,
|
||||
marked with `partial: True` and `partial_reason`. The helper is
|
||||
idempotent — if a dir entry already exists, it returns `""` without
|
||||
writing.
|
||||
3. Breaks out of the loop
|
||||
|
||||
This means a budget breach doesn't lose work — anything the agent already
|
||||
cached survives, and the synthesis pass will see a partial dir summary
|
||||
rather than nothing.
|
||||
This means a budget breach doesn't lose work — anything the agent
|
||||
already cached survives, and the synthesis pass will see a partial dir
|
||||
summary rather than nothing.
|
||||
|
||||
### 4.5 What the loop returns
|
||||
|
||||
`_run_dir_loop()` returns the `summary` string from `submit_report` (or
|
||||
the partial summary returned by `_flush_partial_dir_entry()` if the
|
||||
budget tripped). `_run_investigation()` then writes a normal `dir` cache
|
||||
entry from this summary, *unless* the dir loop already wrote one itself
|
||||
via the partial-flush path, in which case the `cache.has_entry("dir",
|
||||
dir_path)` check skips it.
|
||||
`_run_dir_loop()` returns `(summary, completeness)`. The summary is the
|
||||
string from `submit_report` (or the partial summary returned by
|
||||
`_flush_partial_dir_entry()` if the budget tripped). The completeness
|
||||
is the agent's self-rated investigation thoroughness (0.0–1.0) — Phase
|
||||
3 instrumentation used in `plan_evaluation.json` — or `None` if the
|
||||
agent didn't report one.
|
||||
|
||||
`_run_investigation()` writes a normal `dir` cache entry from this
|
||||
summary (with `completeness` included if non-None), *unless* the dir
|
||||
loop already wrote one itself via the partial-flush path, in which case
|
||||
the `cache.has_entry("dir", dir_path)` check skips it.
|
||||
|
||||
### 4.6 The streaming API caller
|
||||
|
||||
`_call_api_streaming()` (`ai.py:686`) is a thin wrapper around
|
||||
`_call_api_streaming()` is a thin wrapper around
|
||||
`client.messages.stream()`. It currently doesn't print tokens as they
|
||||
arrive — it iterates the stream, drops everything, then pulls the final
|
||||
message via `stream.get_final_message()`. The streaming API is used for
|
||||
real-time tool decision printing, which today happens only after the full
|
||||
response arrives. There's room here to add live progress printing if you
|
||||
want it.
|
||||
real-time tool decision printing, which today happens only after the
|
||||
full response arrives. There's room here to add live progress printing
|
||||
if you want it.
|
||||
|
||||
### 4.7 The leaf-first contract (load-bearing for child summaries)
|
||||
|
||||
|
|
@ -339,32 +385,34 @@ the full design.
|
|||
## 5. The cache model
|
||||
|
||||
Cache lives at `/tmp/luminos/{investigation_id}/`. Code is
|
||||
`luminos_lib/cache.py` (201 lines).
|
||||
`luminos_lib/cache.py`.
|
||||
|
||||
### 5.1 Investigation IDs
|
||||
|
||||
`/tmp/luminos/investigations.json` maps absolute target paths to UUIDs.
|
||||
`_get_investigation_id()` (`cache.py:40`) looks up the target and either
|
||||
returns the existing UUID (resume) or creates a new one (fresh run).
|
||||
`--fresh` forces a new UUID even if one exists.
|
||||
`_get_investigation_id()` looks up the target and either returns the
|
||||
existing UUID (resume) or creates a new one (fresh run). `--fresh`
|
||||
forces a new UUID even if one exists.
|
||||
|
||||
### 5.2 What's stored
|
||||
|
||||
Inside `/tmp/luminos/{uuid}/`:
|
||||
|
||||
```
|
||||
meta.json investigation metadata (model, start time, dir count)
|
||||
files/<sha256>.json one file per cached file entry
|
||||
dirs/<sha256>.json one file per cached directory entry
|
||||
flags.jsonl JSONL — appended on every flag tool call
|
||||
investigation.log JSONL — appended on every tool call
|
||||
meta.json investigation metadata (model, start time, dir count)
|
||||
plan.json planning pass output — cached for resumed runs
|
||||
plan_evaluation.json post-investigation quality report (Phase 3)
|
||||
files/<sha256>.json one file per cached file entry
|
||||
dirs/<sha256>.json one file per cached directory entry
|
||||
flags.jsonl JSONL — appended on every flag tool call
|
||||
investigation.log JSONL — appended on every tool call
|
||||
```
|
||||
|
||||
**File and dir cache entries are NOT in JSONL** — they are one
|
||||
sha256-keyed JSON file per entry. The sha256 is over the path string
|
||||
(`cache.py:13`). Only `flags.jsonl` and `investigation.log` use JSONL.
|
||||
sha256-keyed JSON file per entry. The sha256 is over the path string.
|
||||
Only `flags.jsonl` and `investigation.log` use JSONL.
|
||||
|
||||
Required fields are validated in `write_entry()` (`cache.py:115`):
|
||||
Required fields are validated in `write_entry()`:
|
||||
|
||||
```python
|
||||
file: {path, relative_path, size_bytes, category, summary, cached_at}
|
||||
|
|
@ -376,31 +424,45 @@ The validator also rejects entries containing `content`, `contents`, or
|
|||
contents, summaries only. If you change the schema, update the required
|
||||
set in `write_entry()` and update the test in `tests/test_cache.py`.
|
||||
|
||||
### 5.3 Confidence support already exists
|
||||
### 5.3 Confidence + completeness support
|
||||
|
||||
`write_entry()` validates an optional `confidence` field
|
||||
(`cache.py:129–134`) and a `confidence_reason` string.
|
||||
`low_confidence_entries(threshold=0.7)` (`cache.py:191`) returns all
|
||||
entries below a threshold, sorted ascending. The agent doesn't currently
|
||||
*set* these fields in any prompt — that lights up when Phase 1 work
|
||||
actually wires the prompts.
|
||||
`write_entry()` validates optional `confidence` and `confidence_reason`
|
||||
fields (Phase 1) and an optional `completeness` field (Phase 3,
|
||||
0.0–1.0, the dir agent's self-rated thoroughness).
|
||||
`low_confidence_entries(threshold=0.7)` returns all entries below a
|
||||
threshold, sorted ascending — future refinement-pass fuel.
|
||||
|
||||
### 5.4 Why one-file-per-entry instead of JSONL
|
||||
|
||||
Random access by path. The dir loop calls `cache.has_entry("dir", path)`
|
||||
once per directory during the `_get_child_summaries()` lookup; with
|
||||
sha256-keyed files this is an `os.path.exists()` call. With JSONL it
|
||||
would be a full file scan.
|
||||
Random access by path. The dir loop calls
|
||||
`cache.has_entry("dir", path)` once per directory during the
|
||||
`_get_child_summaries()` lookup; with sha256-keyed files this is an
|
||||
`os.path.exists()` call. With JSONL it would be a full file scan.
|
||||
|
||||
### 5.5 The planning files
|
||||
|
||||
`plan.json` is written by `_run_investigation()` after a successful
|
||||
planning pass, so resumed runs can skip the planner. It is loaded
|
||||
before the dir loops run when `--fresh` is not set and the file
|
||||
exists.
|
||||
|
||||
`plan_evaluation.json` is written by `_write_plan_evaluation()` after
|
||||
the dir loops finish. Schema: `plan_order`, `total_dirs_investigated`,
|
||||
`total_turns_allocated`, `total_turns_used`, `overall_utilization`,
|
||||
`per_directory` (list of `{dir, planned_tier, turns_allocated,
|
||||
turns_used, utilization, completeness, confidence}`), `evaluated_at`.
|
||||
See [Planning Pass](PlanningPass) for how to use it.
|
||||
|
||||
---
|
||||
|
||||
## 6. Prompts
|
||||
|
||||
All prompt templates live in `luminos_lib/prompts.py`. There are three:
|
||||
All prompt templates live in `luminos_lib/prompts.py`. There are four:
|
||||
|
||||
| Constant | Used by | What it carries |
|
||||
|---|---|---|
|
||||
| `_SURVEY_SYSTEM_PROMPT` | `_run_survey` | survey_signals, tree_preview, available_tools |
|
||||
| `_PLANNING_SYSTEM_PROMPT` | `_run_planning` | survey, tree, file signals, cached_dirs |
|
||||
| `_DIR_SYSTEM_PROMPT` | `_run_dir_loop` | dir_path, dir_rel, max_turns, context, child_summaries, survey_context |
|
||||
| `_SYNTHESIS_SYSTEM_PROMPT` | `_run_synthesis` | target, summaries_text |
|
||||
|
||||
|
|
@ -424,8 +486,8 @@ that reason.
|
|||
|
||||
## 7. Synthesis pass
|
||||
|
||||
`_run_synthesis()` (`ai.py:1157`) is structurally similar to the dir loop
|
||||
but much simpler:
|
||||
`_run_synthesis()` is structurally similar to the dir loop but much
|
||||
simpler:
|
||||
|
||||
- Reads all `dir` cache entries via `cache.read_all_entries("dir")`
|
||||
- Renders them into a `summaries_text` block (one section per dir)
|
||||
|
|
@ -434,31 +496,29 @@ but much simpler:
|
|||
`detailed` fields
|
||||
|
||||
Tools available: `read_cache`, `list_cache`, `flag`, `submit_report`
|
||||
(`_SYNTHESIS_TOOLS` at `ai.py:401`). The synthesis agent can pull
|
||||
specific cache entries back if it needs to drill in, but it cannot read
|
||||
files directly — synthesis is meant to operate on summaries, not raw
|
||||
contents.
|
||||
(`_SYNTHESIS_TOOLS`). The synthesis agent can pull specific cache
|
||||
entries back if it needs to drill in, but it cannot read files directly
|
||||
— synthesis is meant to operate on summaries, not raw contents.
|
||||
|
||||
There's a fallback: if synthesis runs out of turns without calling
|
||||
`submit_report`, `_synthesize_from_cache()` (`ai.py:1262`) builds a
|
||||
mechanical brief+detailed from the cached dir summaries with no AI call.
|
||||
This guarantees you always get *something* in the report.
|
||||
`submit_report`, `_synthesize_from_cache()` builds a mechanical
|
||||
brief+detailed from the cached dir summaries with no AI call. This
|
||||
guarantees you always get *something* in the report.
|
||||
|
||||
---
|
||||
|
||||
## 8. Flags
|
||||
|
||||
The `flag` tool is the agent's pressure valve for "I noticed something
|
||||
that should not be lost in the summary." `_tool_flag()` (`ai.py:629`)
|
||||
prints to stderr *and* appends a JSONL line to
|
||||
`{cache.root}/flags.jsonl`. At the end of `_run_investigation()`
|
||||
(`ai.py:1387–1397`), the orchestrator reads that file back and includes
|
||||
the flags in its return tuple. `format_report()` then renders them in a
|
||||
dedicated section.
|
||||
that should not be lost in the summary." `_tool_flag()` prints to stderr
|
||||
*and* appends a JSONL line to `{cache.root}/flags.jsonl`. At the end of
|
||||
`_run_investigation()`, the orchestrator reads that file back and
|
||||
includes the flags in its return tuple. `format_report()` then renders
|
||||
them in a dedicated section.
|
||||
|
||||
Severity is `info | concern | critical`. The agent is told to flag
|
||||
*immediately* on discovery, not save findings for the report — this is in
|
||||
the tool description at `ai.py:312`.
|
||||
*immediately* on discovery, not save findings for the report — this is
|
||||
in the tool description.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -484,10 +544,11 @@ A cookbook for the kinds of changes that come up most often.
|
|||
contains your handler and `_DIR_TOOLS` contains your schema after
|
||||
importing `luminos_lib.ai`.
|
||||
|
||||
To make a tool available in synthesis or survey instead of (or in
|
||||
addition to) dir, pass `scopes=["synthesis"]`, `scopes=["survey"]`, or
|
||||
`scopes=["dir", "synthesis"]`. Tools whose schema differs by scope (like
|
||||
`submit_report`) get a separate `register_tool()` call per scope.
|
||||
To make a tool available in synthesis, survey, or planning instead of
|
||||
(or in addition to) dir, pass `scopes=["synthesis"]`, `scopes=["survey"]`,
|
||||
`scopes=["planning"]`, or any combination. Tools whose schema differs by
|
||||
scope (like `submit_report`) get a separate `register_tool()` call per
|
||||
scope.
|
||||
|
||||
### 9.2 Add a whole new pass
|
||||
|
||||
|
|
@ -522,8 +583,7 @@ unless you `--fresh`.
|
|||
### 9.4 Change cache schema
|
||||
|
||||
1. Update the required-fields set in `cache.py:write_entry()`
|
||||
(`cache.py:119–123`)
|
||||
2. Update `_DIR_TOOLS`'s `write_cache` description in `ai.py:228` so the
|
||||
2. Update `_DIR_TOOLS`'s `write_cache` description in `ai.py` so the
|
||||
agent knows what to write
|
||||
3. Update `_DIR_SYSTEM_PROMPT` in `prompts.py` if the agent needs to know
|
||||
*how* to populate the new field
|
||||
|
|
@ -532,26 +592,25 @@ unless you `--fresh`.
|
|||
|
||||
### 9.5 Add a CLI flag
|
||||
|
||||
Edit `luminos.py:88` (`main()`'s argparse setup) to define the flag, then
|
||||
Edit `luminos.py:main()`'s argparse setup to define the flag, then
|
||||
plumb it through whatever functions need it. New AI-related flags
|
||||
typically need to be added to `analyze_directory()`'s signature
|
||||
(`ai.py:1408`) and then forwarded to `_run_investigation()`.
|
||||
typically need to be added to `analyze_directory()`'s signature and
|
||||
then forwarded to `_run_investigation()`.
|
||||
|
||||
---
|
||||
|
||||
## 10. Token budget and cost
|
||||
|
||||
Budget logic is in `_TokenTracker.budget_exceeded()` and is checked at the
|
||||
top of every dir loop iteration (`ai.py:882`). The budget is **per call**,
|
||||
not cumulative — see §4.4. The breach handler flushes a partial dir cache
|
||||
Budget logic is in `_TokenTracker.budget_exceeded()` and is checked at
|
||||
the top of every dir loop iteration. The budget is **per call**, not
|
||||
cumulative — see §4.4. The breach handler flushes a partial dir cache
|
||||
entry so work isn't lost.
|
||||
|
||||
Cost reporting happens once at the end of `_run_investigation()`
|
||||
(`ai.py:1399`), using the cumulative `total_input` and `total_output`
|
||||
counters multiplied by the constants at `ai.py:43–44`. There is no
|
||||
running cost display during the investigation today. If you want one,
|
||||
`_TokenTracker.summary()` already returns the formatted string — just
|
||||
call it after each dir loop.
|
||||
Cost reporting happens once at the end of `_run_investigation()`, using
|
||||
the cumulative `total_input` and `total_output` counters multiplied by
|
||||
the constants near the top of `ai.py`. There is no running cost display
|
||||
during the investigation today. If you want one, `_TokenTracker.summary()`
|
||||
already returns the formatted string — just call it after each dir loop.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -560,16 +619,20 @@ call it after each dir loop.
|
|||
| Term | Meaning |
|
||||
|---|---|
|
||||
| **base scan** | The non-AI phase: tree, classification, languages, recency, disk usage. Stdlib + coreutils only. |
|
||||
| **dir loop** | Per-directory agent loop in `_run_dir_loop`. Up to 14 turns. Produces a `dir` cache entry. |
|
||||
| **dir loop** | Per-directory agent loop in `_run_dir_loop`. Turns allocated by the planning pass (5 shallow / 10 default / 15–20 priority, capped at 25). Produces a `dir` cache entry. |
|
||||
| **survey pass** | Single short loop before any dir loops, producing a shared description and tool guidance. |
|
||||
| **planning pass** | Phase 3 pass after the survey, before dir loops. Produces a plan (priority/shallow/skip dirs + turn allocations + order). |
|
||||
| **synthesis pass** | Final loop that reads `dir` cache entries and produces `(brief, detailed)`. |
|
||||
| **leaves-first** | Discovery order in `_discover_directories`: deepest paths first, so child summaries exist when parents are investigated. |
|
||||
| **leaves-first** | Discovery order in `_discover_directories`: deepest paths first, so child summaries exist when parents are investigated. Preserved within planning bands by `_apply_plan`. |
|
||||
| **investigation** | One end-to-end run, identified by a UUID, persisted under `/tmp/luminos/{uuid}/`. |
|
||||
| **investigation_id** | The UUID. Stored in `/tmp/luminos/investigations.json` keyed by absolute target path. |
|
||||
| **cache entry** | A JSON file under `files/` or `dirs/` named by sha256(path). |
|
||||
| **flag** | An agent finding written to `flags.jsonl` and reported separately. info / concern / critical. |
|
||||
| **partial entry** | A `dir` cache entry written when the budget tripped before `submit_report`. Marked with `partial: True`. |
|
||||
| **completeness** | Phase 3 agent self-rated thoroughness (0.0–1.0) from `submit_report`. Feeds `plan_evaluation.json`. |
|
||||
| **survey signals** | The histogram + samples computed by `filetypes.survey_signals()` during the base scan, fed to the survey prompt. |
|
||||
| **last_input** | The `input_tokens` count from the most recent API call. The basis for budget checks. NOT the cumulative sum. |
|
||||
| **CONTEXT_BUDGET** | 70% of 200k = 140k. Trigger threshold for early exit. |
|
||||
| **`_PROTECTED_DIR_TOOLS`** | Tools the survey is forbidden from filtering out of the dir loop's toolbox. Currently `{submit_report}`. |
|
||||
| **plan.json** | Serialized planning output, cached so resumed runs skip the planner. |
|
||||
| **plan_evaluation.json** | Post-investigation quality report comparing plan predictions to outcomes. |
|
||||
|
|
|
|||
Loading…
Reference in a new issue