wiki: add Internals code tour, fix Architecture cache + survey, replace Roadmap with status pointer (#53)
parent
bda9958728
commit
457e0a7661
5 changed files with 621 additions and 145 deletions
|
|
@ -1,5 +1,9 @@
|
||||||
# Architecture
|
# Architecture
|
||||||
|
|
||||||
|
> This page is the high-level map. For a code-level walkthrough with
|
||||||
|
> file:line references — how the dir loop actually works, where to add a
|
||||||
|
> tool, what the cache really stores — read [Internals](Internals).
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Luminos is a zero-dependency Python CLI at its base. The `--ai` flag layers an
|
Luminos is a zero-dependency Python CLI at its base. The `--ai` flag layers an
|
||||||
|
|
@ -49,54 +53,94 @@ scan(target)
|
||||||
|
|
||||||
```
|
```
|
||||||
analyze_directory(report, target)
|
analyze_directory(report, target)
|
||||||
|
│
|
||||||
|
└── _run_investigation()
|
||||||
|
│
|
||||||
|
├── _get_investigation_id() new UUID, or resume an existing one
|
||||||
│
|
│
|
||||||
├── _discover_directories() find all dirs, sort leaves-first
|
├── _discover_directories() find all dirs, sort leaves-first
|
||||||
│
|
│
|
||||||
├── per-directory loop (each dir, up to max_turns=14)
|
├── _run_survey() single short loop, max 3 turns
|
||||||
│ _build_dir_context() list files + sizes
|
│ inputs: survey_signals + 2-level tree preview
|
||||||
|
│ Tools: submit_survey
|
||||||
|
│ output: shared description, approach, relevant_tools,
|
||||||
|
│ skip_tools, domain_notes, confidence
|
||||||
|
│ (skipped via _default_survey() on tiny targets)
|
||||||
|
│
|
||||||
|
├── _filter_dir_tools(survey) remove skip_tools (if confidence ≥ 0.5)
|
||||||
|
│
|
||||||
|
├── per-directory loop (each uncached dir, up to max_turns=14)
|
||||||
|
│ _build_dir_context() list files + sizes + MIME
|
||||||
│ _get_child_summaries() read cached child summaries
|
│ _get_child_summaries() read cached child summaries
|
||||||
│ _run_dir_loop() agent loop: read files, parse structure,
|
│ _format_survey_block() inject survey context into prompt
|
||||||
│ write cache entries, submit_report
|
│ _run_dir_loop() agent loop with budget check on
|
||||||
│ Tools: read_file, list_directory,
|
│ every iteration; flushes a partial
|
||||||
│ run_command, parse_structure,
|
│ cache entry on budget breach
|
||||||
│ write_cache, think, checkpoint,
|
│ Tools: read_file, list_directory, run_command,
|
||||||
|
│ parse_structure, write_cache, think, checkpoint,
|
||||||
│ flag, submit_report
|
│ flag, submit_report
|
||||||
│
|
│
|
||||||
├── _run_synthesis() one-shot aggregation of dir summaries
|
├── _run_synthesis() single loop, max 5 turns
|
||||||
│ reads all "dir" cache entries
|
│ reads all "dir" cache entries
|
||||||
│ produces brief (2-4 sentences) + detailed (free-form)
|
│ produces brief (2-4 sentences) + detailed (free-form)
|
||||||
│ Tools: read_cache, list_cache, flag, submit_report
|
│ Tools: read_cache, list_cache, flag, submit_report
|
||||||
|
│ fallback: _synthesize_from_cache() if out of turns
|
||||||
│
|
│
|
||||||
└── returns (brief, detailed, flags)
|
└── returns (brief, detailed, flags)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Token usage and the context budget are tracked by `_TokenTracker` in
|
||||||
|
`ai.py`. The budget check uses the *most recent* call's `input_tokens`,
|
||||||
|
not the cumulative sum across turns — see #44 and the
|
||||||
|
[Internals](Internals) page §4.4 for why.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Cache
|
## Cache
|
||||||
|
|
||||||
Location: `/tmp/luminos/<investigation_id>/`
|
Location: `/tmp/luminos/<investigation_id>/`
|
||||||
|
|
||||||
Two entry types, both stored as JSONL:
|
Layout:
|
||||||
|
|
||||||
**File entries** (`files.jsonl`):
|
|
||||||
```
|
```
|
||||||
{path, relative_path, size_bytes, category, summary, notable,
|
meta.json investigation metadata
|
||||||
notable_reason, cached_at}
|
files/<sha256>.json one JSON file per cached file entry
|
||||||
|
dirs/<sha256>.json one JSON file per cached directory entry
|
||||||
|
flags.jsonl JSONL — appended on every flag tool call
|
||||||
|
investigation.log JSONL — appended on every tool call
|
||||||
```
|
```
|
||||||
|
|
||||||
**Dir entries** (`dirs.jsonl`):
|
File and dir entries are stored as one sha256-keyed JSON file per entry
|
||||||
|
(not as JSONL) so that `has_entry(path)` is an O(1) `os.path.exists()`
|
||||||
|
check rather than a file scan. Only `flags.jsonl` and `investigation.log`
|
||||||
|
are JSONL.
|
||||||
|
|
||||||
|
**File entries** (`files/<sha256>.json`):
|
||||||
```
|
```
|
||||||
{path, relative_path, child_count, summary, dominant_category,
|
{path, relative_path, size_bytes, category, summary, cached_at,
|
||||||
notable_files, cached_at}
|
[confidence], [confidence_reason], [notable], [notable_reason]}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Dir entries** (`dirs/<sha256>.json`):
|
||||||
|
```
|
||||||
|
{path, relative_path, child_count, dominant_category, summary, cached_at,
|
||||||
|
[confidence], [confidence_reason], [notable_files],
|
||||||
|
[partial], [partial_reason]}
|
||||||
|
```
|
||||||
|
|
||||||
|
`partial: true` marks a dir entry written by the budget-breach early-exit
|
||||||
|
path — the agent didn't reach `submit_report` and the summary was
|
||||||
|
synthesized from already-cached file entries.
|
||||||
|
|
||||||
**Flags** (`flags.jsonl`):
|
**Flags** (`flags.jsonl`):
|
||||||
```
|
```
|
||||||
{path, finding, severity} severity: info | concern | critical
|
{path, finding, severity} severity: info | concern | critical
|
||||||
```
|
```
|
||||||
|
|
||||||
Cache is reused across runs for the same target. `--fresh` ignores it.
|
Investigation IDs are persisted in `/tmp/luminos/investigations.json`
|
||||||
`--clear-cache` deletes it.
|
keyed by absolute target path. Cache is reused across runs for the same
|
||||||
|
target. `--fresh` mints a new investigation ID. `--clear-cache` deletes
|
||||||
|
the entire cache root.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,10 @@
|
||||||
# Development Guide
|
# Development Guide
|
||||||
|
|
||||||
|
> This page covers **how to set up, run, and test** Luminos. For a
|
||||||
|
> code-level walkthrough of how the AI pipeline actually works — the dir
|
||||||
|
> loop, the cache, the survey pass, where to add a tool — read
|
||||||
|
> [Internals](Internals).
|
||||||
|
|
||||||
## Running Luminos
|
## Running Luminos
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
|
||||||
5
Home.md
5
Home.md
|
|
@ -21,8 +21,9 @@ of what the directory contains and why.
|
||||||
| Page | Contents |
|
| Page | Contents |
|
||||||
|---|---|
|
|---|---|
|
||||||
| [Architecture](Architecture) | Module breakdown, data flow, AI pipeline |
|
| [Architecture](Architecture) | Module breakdown, data flow, AI pipeline |
|
||||||
| [Development Guide](DevelopmentGuide) | Git workflow, naming conventions, commands |
|
| [Internals](Internals) | Code-level tour: dir loop, cache, prompts, where to make changes |
|
||||||
| [Roadmap](Roadmap) | Planned phases and open design questions |
|
| [Development Guide](DevelopmentGuide) | Setup, git workflow, testing, commands |
|
||||||
|
| [Roadmap](Roadmap) | Phase status — pointer to PLAN.md and open issues |
|
||||||
| [Session Retrospectives](SessionRetrospectives) | Full session history |
|
| [Session Retrospectives](SessionRetrospectives) | Full session history |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
|
||||||
514
Internals.md
Normal file
514
Internals.md
Normal file
|
|
@ -0,0 +1,514 @@
|
||||||
|
# Internals
|
||||||
|
|
||||||
|
A code tour of how Luminos actually works. Read this after
|
||||||
|
[Development Guide](DevelopmentGuide) and [Architecture](Architecture). The
|
||||||
|
goal is that a developer who knows basic Python but has never built an
|
||||||
|
agent loop can finish this page and start making non-trivial changes.
|
||||||
|
|
||||||
|
All file:line references are accurate as of the date this page was last
|
||||||
|
edited — verify with `git log` or by opening the file before relying on a
|
||||||
|
specific line number.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. The two layers
|
||||||
|
|
||||||
|
Luminos has a hard internal split:
|
||||||
|
|
||||||
|
| Layer | What it does | Imports |
|
||||||
|
|---|---|---|
|
||||||
|
| **Base scan** | Walks the directory, classifies files, counts lines, ranks recency, measures disk usage, prints a report. | stdlib only + GNU coreutils via subprocess. **No pip packages.** |
|
||||||
|
| **AI pipeline** (`--ai`) | Runs a multi-pass agent investigation via the Claude API on top of the base scan output. | `anthropic`, `tree-sitter`, `python-magic` — all imported lazily. |
|
||||||
|
|
||||||
|
The split is enforced by lazy imports. `luminos.py:156` is the only place
|
||||||
|
that imports from `luminos_lib.ai`, and it sits inside `if args.ai:`. You
|
||||||
|
can grep the codebase to verify: nothing in the base scan modules imports
|
||||||
|
anything from `ai.py`, `ast_parser.py`, or `prompts.py`. This means
|
||||||
|
`python3 luminos.py /target` works on a stock Python 3 install with no
|
||||||
|
packages installed at all.
|
||||||
|
|
||||||
|
When you change a base-scan module, the question to ask is: *does this
|
||||||
|
introduce a top-level import of anything outside stdlib?* If yes, you've
|
||||||
|
broken the constraint and the change must be rewritten.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Base scan walkthrough
|
||||||
|
|
||||||
|
Entry: `luminos.py:main()` parses args, then calls `scan(target, ...)` at
|
||||||
|
`luminos.py:45`. `scan()` is a flat sequence — it builds a `report` dict
|
||||||
|
by calling helpers from `luminos_lib/`, one per concern, in order:
|
||||||
|
|
||||||
|
```
|
||||||
|
scan(target)
|
||||||
|
build_tree() → report["tree"], report["tree_rendered"]
|
||||||
|
classify_files() → report["classified_files"]
|
||||||
|
summarize_categories() → report["file_categories"]
|
||||||
|
survey_signals() → report["survey_signals"] ← input to AI survey
|
||||||
|
detect_languages() → report["languages"], report["lines_of_code"]
|
||||||
|
find_large_files() → report["large_files"]
|
||||||
|
find_recent_files() → report["recent_files"]
|
||||||
|
get_disk_usage() → report["disk_usage"]
|
||||||
|
top_directories() → report["top_directories"]
|
||||||
|
return report
|
||||||
|
```
|
||||||
|
|
||||||
|
Each helper is independent. You could delete `find_recent_files()` and the
|
||||||
|
report would just be missing that field. The flow is procedural, not
|
||||||
|
event-driven, and there is no shared state object — everything passes
|
||||||
|
through the local `report` dict.
|
||||||
|
|
||||||
|
The progress lines you see on stderr (`[scan] Counting lines... foo.py`)
|
||||||
|
come from `_progress()` in `luminos.py:23`, which returns an `on_file`
|
||||||
|
callback that the helpers call as they work. If you add a new helper that
|
||||||
|
walks files, plumb a progress callback through the same way for
|
||||||
|
consistency.
|
||||||
|
|
||||||
|
After `scan()` returns, `main()` either runs the AI pipeline or jumps
|
||||||
|
straight to `format_report()` (`luminos_lib/report.py`) for terminal
|
||||||
|
output, or `json.dumps()` for JSON. The AI pipeline always runs *after*
|
||||||
|
the base scan because it needs `report["survey_signals"]` and
|
||||||
|
`report["file_categories"]` as inputs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. AI pipeline walkthrough
|
||||||
|
|
||||||
|
The AI pipeline is what makes Luminos interesting and is also where
|
||||||
|
almost all the complexity lives. Everything below happens inside
|
||||||
|
`luminos_lib/ai.py` (1438 lines as of writing), called from
|
||||||
|
`luminos.py:157` via `analyze_directory()`.
|
||||||
|
|
||||||
|
### 3.1 The orchestrator
|
||||||
|
|
||||||
|
`analyze_directory()` (`ai.py:1408`) is a thin wrapper that checks
|
||||||
|
dependencies, gets the API key, builds the Anthropic client, and calls
|
||||||
|
`_run_investigation()`. If anything fails it prints a warning and returns
|
||||||
|
empty strings — the rest of luminos keeps working.
|
||||||
|
|
||||||
|
`_run_investigation()` (`ai.py:1286`) is the real entry point. Read this
|
||||||
|
function first if you want to understand the pipeline shape. It does six
|
||||||
|
things, in order:
|
||||||
|
|
||||||
|
1. **Get/create an investigation ID and cache** (`ai.py:1289–1294`).
|
||||||
|
Investigation IDs let you resume a previous run; see §5 below.
|
||||||
|
2. **Discover all directories** under the target via
|
||||||
|
`_discover_directories()` (`ai.py:715`). Returns them sorted
|
||||||
|
*leaves-first* — the deepest paths come first. This matters because
|
||||||
|
each dir loop reads its child directories' summaries from cache, so
|
||||||
|
children must be investigated before parents.
|
||||||
|
3. **Run the survey pass** (`ai.py:1300–1334`) unless the target is below
|
||||||
|
the size thresholds at `ai.py:780–781`, in which case
|
||||||
|
`_default_survey()` returns a synthetic skip.
|
||||||
|
4. **Filter out cached directories** (`ai.py:1336–1349`). If you're
|
||||||
|
resuming an investigation, dirs that already have a `dir` cache entry
|
||||||
|
are skipped — only new ones get a fresh dir loop.
|
||||||
|
5. **Run a dir loop per remaining directory** (`ai.py:1351–1375`). This
|
||||||
|
is the heart of the system — see §4.
|
||||||
|
6. **Run the synthesis pass** (`ai.py:1382`) reading only `dir` cache
|
||||||
|
entries to produce `(brief, detailed)`.
|
||||||
|
|
||||||
|
It also reads `flags.jsonl` from disk at the end (`ai.py:1387–1397`) and
|
||||||
|
returns `(brief, detailed, flags)` to `analyze_directory()`.
|
||||||
|
|
||||||
|
### 3.2 The survey pass
|
||||||
|
|
||||||
|
`_run_survey()` (`ai.py:1051`) is a short, single-purpose loop. It exists
|
||||||
|
to give the dir loops some shared context about what they're looking at
|
||||||
|
*as a whole* before any of them start.
|
||||||
|
|
||||||
|
Inputs go into the system prompt (`_SURVEY_SYSTEM_PROMPT` in
|
||||||
|
`prompts.py`):
|
||||||
|
- `survey_signals` — extension histogram, `file --brief` outputs, filename
|
||||||
|
samples (built by `filetypes.survey_signals()` during the base scan)
|
||||||
|
- A 2-level tree preview from `build_tree(target, max_depth=2)`
|
||||||
|
- The list of tools the dir loop will have available
|
||||||
|
|
||||||
|
The survey is allowed only `submit_survey` as a tool (`_SURVEY_TOOLS` at
|
||||||
|
`ai.py:356`). It runs at most 3 turns. The agent must call `submit_survey`
|
||||||
|
exactly once with six fields:
|
||||||
|
|
||||||
|
```python
|
||||||
|
{
|
||||||
|
"description": "plain language — what is this target",
|
||||||
|
"approach": "how the dir loops should investigate it",
|
||||||
|
"relevant_tools": ["read_file", "parse_structure", ...],
|
||||||
|
"skip_tools": ["parse_structure", ...], # for non-code targets
|
||||||
|
"domain_notes": "anything unusual the dir loops should know",
|
||||||
|
"confidence": 0.0–1.0,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The result is a Python dict that gets passed into every dir loop as
|
||||||
|
`survey=...`. If the survey fails (API error, ran out of turns), the dir
|
||||||
|
loops still run but with `survey=None` — the system degrades gracefully.
|
||||||
|
|
||||||
|
### 3.3 How the survey shapes dir loops
|
||||||
|
|
||||||
|
Two things happen with the survey output before each dir loop runs:
|
||||||
|
|
||||||
|
**Survey block injection.** `_format_survey_block()` (`ai.py:803`) renders
|
||||||
|
the survey dict as a labeled text block, which gets `.format()`-injected
|
||||||
|
into the dir loop system prompt as `{survey_context}`. The dir agent sees
|
||||||
|
the description, approach, domain notes, and which tools it should lean on
|
||||||
|
or skip.
|
||||||
|
|
||||||
|
**Tool filtering.** `_filter_dir_tools()` (`ai.py:824`) returns a copy of
|
||||||
|
`_DIR_TOOLS` with anything in `skip_tools` removed — but only if the
|
||||||
|
survey's confidence is at or above `_SURVEY_CONFIDENCE_THRESHOLD = 0.5`
|
||||||
|
(`ai.py:775`). Below that threshold the agent gets the full toolbox. The
|
||||||
|
control-flow tool `submit_report` is in `_PROTECTED_DIR_TOOLS` and can
|
||||||
|
never be filtered out — removing it would break loop termination.
|
||||||
|
|
||||||
|
This is the only place in the codebase where the agent's available tools
|
||||||
|
change at runtime. If you add a new tool, decide whether it should be
|
||||||
|
protectable.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. The dir loop in depth
|
||||||
|
|
||||||
|
`_run_dir_loop()` is at `ai.py:845`. This is a hand-written agent loop and
|
||||||
|
you should expect to read it several times before it clicks. The shape is:
|
||||||
|
|
||||||
|
```
|
||||||
|
build system prompt (with survey context, child summaries, dir contents)
|
||||||
|
build initial user message ("investigate this directory now")
|
||||||
|
reset per-loop token counter
|
||||||
|
for turn in range(max_turns): # max_turns = 14
|
||||||
|
if budget exceeded: flush partial cache and break
|
||||||
|
call API (streaming)
|
||||||
|
record token usage
|
||||||
|
print text blocks and tool decisions to stderr
|
||||||
|
append assistant response to message history
|
||||||
|
if no tool calls: nudge agent to call submit_report; continue
|
||||||
|
execute each tool call, build tool_result blocks
|
||||||
|
append tool_results to message history as user message
|
||||||
|
if submit_report was called: break
|
||||||
|
return summary
|
||||||
|
```
|
||||||
|
|
||||||
|
A few non-obvious mechanics:
|
||||||
|
|
||||||
|
### 4.1 The message history grows monotonically
|
||||||
|
|
||||||
|
Every turn appends an assistant message (the model's response) and a user
|
||||||
|
message (the tool results). Nothing is ever evicted. This means
|
||||||
|
`input_tokens` on each successive API call grows roughly linearly — the
|
||||||
|
model is re-sent the full conversation every turn. On code targets we see
|
||||||
|
~1.5–2k tokens added per turn. At `max_turns=14` this stays under the
|
||||||
|
budget; raising the cap would expose this. See **#51**.
|
||||||
|
|
||||||
|
### 4.2 Tool dispatch
|
||||||
|
|
||||||
|
Tools are not class methods. They're plain functions in `ai.py:486–642`,
|
||||||
|
registered into `_TOOL_DISPATCH` at `ai.py:645`. `_execute_tool()`
|
||||||
|
(`ai.py:659`) is a 16-line function that looks up the handler by name,
|
||||||
|
calls it, logs the turn to `investigation.log`, and returns the result
|
||||||
|
string. **The two control-flow tools — `submit_report` and `think`/
|
||||||
|
`checkpoint` for narration — are NOT in `_TOOL_DISPATCH`** because the
|
||||||
|
loop body handles them specially:
|
||||||
|
- `submit_report` is recognized in the tool-use scan at `ai.py:977`, sets
|
||||||
|
`done = True`, and doesn't go through dispatch
|
||||||
|
- `think`, `checkpoint`, and `flag` *are* in dispatch, but they have side
|
||||||
|
effects that just print to stderr or append to `flags.jsonl` — the
|
||||||
|
return value is always `"ok"`
|
||||||
|
|
||||||
|
When you add a tool: write the function, add it to `_TOOL_DISPATCH`, add
|
||||||
|
its schema to `_DIR_TOOLS`. That's it.
|
||||||
|
|
||||||
|
### 4.3 Pre-loaded context
|
||||||
|
|
||||||
|
Before the loop starts, two helpers prepare static context that goes into
|
||||||
|
the system prompt:
|
||||||
|
|
||||||
|
- `_build_dir_context()` (`ai.py:736`) — `ls`-style listing of the dir
|
||||||
|
with sizes and MIME types via `python-magic`. The agent sees this
|
||||||
|
*before* it makes any tool calls, so it doesn't waste a turn just
|
||||||
|
listing the directory.
|
||||||
|
- `_get_child_summaries()` (`ai.py:758`) — looks up each subdirectory in
|
||||||
|
the cache and pulls its `summary` field. This is how leaves-first
|
||||||
|
ordering pays off: by the time the loop runs on `src/`, all of
|
||||||
|
`src/auth/`, `src/db/`, `src/middleware/` already have cached summaries
|
||||||
|
that get injected as `{child_summaries}`.
|
||||||
|
|
||||||
|
If `_get_child_summaries()` returns nothing, the prompt says
|
||||||
|
`(none — this is a leaf directory)`.
|
||||||
|
|
||||||
|
### 4.4 The token tracker and the budget check
|
||||||
|
|
||||||
|
`_TokenTracker` (`ai.py:94`) is a tiny accumulator with one important
|
||||||
|
subtlety, captured in **#44**:
|
||||||
|
|
||||||
|
> Cumulative input tokens are NOT a meaningful proxy for context size:
|
||||||
|
> each turn's `input_tokens` already includes the full message history,
|
||||||
|
> so summing across turns double-counts everything. Use `last_input` for
|
||||||
|
> budget decisions, totals for billing.
|
||||||
|
|
||||||
|
So `budget_exceeded()` (`ai.py:135`) compares `last_input` (the most
|
||||||
|
recent call's input_tokens) to `CONTEXT_BUDGET` (`ai.py:40`), which is
|
||||||
|
70% of 200k. This is checked at the *top* of each loop iteration, before
|
||||||
|
the next API call.
|
||||||
|
|
||||||
|
When the budget check trips, the loop:
|
||||||
|
1. Prints a `Context budget reached` warning to stderr
|
||||||
|
2. If no `dir` cache entry exists yet, builds a *partial* one from any
|
||||||
|
`file` cache entries the agent already wrote (`ai.py:889–937`), marks
|
||||||
|
it with `partial: True` and `partial_reason`, and writes it
|
||||||
|
3. Breaks out of the loop
|
||||||
|
|
||||||
|
This means a budget breach doesn't lose work — anything the agent already
|
||||||
|
cached survives, and the synthesis pass will see a partial dir summary
|
||||||
|
rather than nothing.
|
||||||
|
|
||||||
|
### 4.5 What the loop returns
|
||||||
|
|
||||||
|
`_run_dir_loop()` returns the `summary` string from `submit_report` (or
|
||||||
|
the partial summary if the budget tripped). `_run_investigation()` then
|
||||||
|
writes a normal `dir` cache entry from this summary at `ai.py:1363–1375`
|
||||||
|
— *unless* the dir loop already wrote one itself via the partial-flush
|
||||||
|
path, in which case the `cache.has_entry("dir", dir_path)` check skips it.
|
||||||
|
|
||||||
|
### 4.6 The streaming API caller
|
||||||
|
|
||||||
|
`_call_api_streaming()` (`ai.py:681`) is a thin wrapper around
|
||||||
|
`client.messages.stream()`. It currently doesn't print tokens as they
|
||||||
|
arrive — it iterates the stream, drops everything, then pulls the final
|
||||||
|
message via `stream.get_final_message()`. The streaming API is used for
|
||||||
|
real-time tool decision printing, which today happens only after the full
|
||||||
|
response arrives. There's room here to add live progress printing if you
|
||||||
|
want it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. The cache model
|
||||||
|
|
||||||
|
Cache lives at `/tmp/luminos/{investigation_id}/`. Code is
|
||||||
|
`luminos_lib/cache.py` (201 lines).
|
||||||
|
|
||||||
|
### 5.1 Investigation IDs
|
||||||
|
|
||||||
|
`/tmp/luminos/investigations.json` maps absolute target paths to UUIDs.
|
||||||
|
`_get_investigation_id()` (`cache.py:40`) looks up the target and either
|
||||||
|
returns the existing UUID (resume) or creates a new one (fresh run).
|
||||||
|
`--fresh` forces a new UUID even if one exists.
|
||||||
|
|
||||||
|
### 5.2 What's stored
|
||||||
|
|
||||||
|
Inside `/tmp/luminos/{uuid}/`:
|
||||||
|
|
||||||
|
```
|
||||||
|
meta.json investigation metadata (model, start time, dir count)
|
||||||
|
files/<sha256>.json one file per cached file entry
|
||||||
|
dirs/<sha256>.json one file per cached directory entry
|
||||||
|
flags.jsonl JSONL — appended on every flag tool call
|
||||||
|
investigation.log JSONL — appended on every tool call
|
||||||
|
```
|
||||||
|
|
||||||
|
**File and dir cache entries are NOT in JSONL** — they are one
|
||||||
|
sha256-keyed JSON file per entry. The sha256 is over the path string
|
||||||
|
(`cache.py:13`). Only `flags.jsonl` and `investigation.log` use JSONL.
|
||||||
|
|
||||||
|
Required fields are validated in `write_entry()` (`cache.py:115`):
|
||||||
|
|
||||||
|
```python
|
||||||
|
file: {path, relative_path, size_bytes, category, summary, cached_at}
|
||||||
|
dir: {path, relative_path, child_count, dominant_category, summary, cached_at}
|
||||||
|
```
|
||||||
|
|
||||||
|
The validator also rejects entries containing `content`, `contents`, or
|
||||||
|
`raw` fields — the agent is explicitly forbidden from caching raw file
|
||||||
|
contents, summaries only. If you change the schema, update the required
|
||||||
|
set in `write_entry()` and update the test in `tests/test_cache.py`.
|
||||||
|
|
||||||
|
### 5.3 Confidence support already exists
|
||||||
|
|
||||||
|
`write_entry()` validates an optional `confidence` field
|
||||||
|
(`cache.py:129–134`) and a `confidence_reason` string.
|
||||||
|
`low_confidence_entries(threshold=0.7)` (`cache.py:191`) returns all
|
||||||
|
entries below a threshold, sorted ascending. The agent doesn't currently
|
||||||
|
*set* these fields in any prompt — that lights up when Phase 1 work
|
||||||
|
actually wires the prompts.
|
||||||
|
|
||||||
|
### 5.4 Why one-file-per-entry instead of JSONL
|
||||||
|
|
||||||
|
Random access by path. The dir loop calls `cache.has_entry("dir", path)`
|
||||||
|
once per directory during the `_get_child_summaries()` lookup; with
|
||||||
|
sha256-keyed files this is an `os.path.exists()` call. With JSONL it
|
||||||
|
would be a full file scan.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Prompts
|
||||||
|
|
||||||
|
All prompt templates live in `luminos_lib/prompts.py`. There are three:
|
||||||
|
|
||||||
|
| Constant | Used by | What it carries |
|
||||||
|
|---|---|---|
|
||||||
|
| `_SURVEY_SYSTEM_PROMPT` | `_run_survey` | survey_signals, tree_preview, available_tools |
|
||||||
|
| `_DIR_SYSTEM_PROMPT` | `_run_dir_loop` | dir_path, dir_rel, max_turns, context, child_summaries, survey_context |
|
||||||
|
| `_SYNTHESIS_SYSTEM_PROMPT` | `_run_synthesis` | target, summaries_text |
|
||||||
|
|
||||||
|
Each is a Python f-string-style template with `{name}` placeholders. The
|
||||||
|
caller assembles values and passes them to `.format(...)` immediately
|
||||||
|
before the API call. There is no template engine — it's plain string
|
||||||
|
formatting.
|
||||||
|
|
||||||
|
When you change a prompt, the only thing you need to keep in sync is the
|
||||||
|
set of placeholders. If you add `{foo}` to the template, the caller must
|
||||||
|
provide `foo=...`. If you remove a placeholder from the template but
|
||||||
|
leave the kwarg in the caller, `.format()` silently ignores it. If you
|
||||||
|
add a placeholder and forget to provide it, `.format()` raises `KeyError`
|
||||||
|
at runtime.
|
||||||
|
|
||||||
|
`prompts.py` has no logic and no tests — it's listed in
|
||||||
|
[Development Guide](DevelopmentGuide) as exempt from unit testing for
|
||||||
|
that reason.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Synthesis pass
|
||||||
|
|
||||||
|
`_run_synthesis()` (`ai.py:1157`) is structurally similar to the dir loop
|
||||||
|
but much simpler:
|
||||||
|
|
||||||
|
- Reads all `dir` cache entries via `cache.read_all_entries("dir")`
|
||||||
|
- Renders them into a `summaries_text` block (one section per dir)
|
||||||
|
- Stuffs that into `_SYNTHESIS_SYSTEM_PROMPT`
|
||||||
|
- Loops up to `max_turns=5` waiting for `submit_report` with `brief` and
|
||||||
|
`detailed` fields
|
||||||
|
|
||||||
|
Tools available: `read_cache`, `list_cache`, `flag`, `submit_report`
|
||||||
|
(`_SYNTHESIS_TOOLS` at `ai.py:401`). The synthesis agent can pull
|
||||||
|
specific cache entries back if it needs to drill in, but it cannot read
|
||||||
|
files directly — synthesis is meant to operate on summaries, not raw
|
||||||
|
contents.
|
||||||
|
|
||||||
|
There's a fallback: if synthesis runs out of turns without calling
|
||||||
|
`submit_report`, `_synthesize_from_cache()` (`ai.py:1262`) builds a
|
||||||
|
mechanical brief+detailed from the cached dir summaries with no AI call.
|
||||||
|
This guarantees you always get *something* in the report.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Flags
|
||||||
|
|
||||||
|
The `flag` tool is the agent's pressure valve for "I noticed something
|
||||||
|
that should not be lost in the summary." `_tool_flag()` (`ai.py:629`)
|
||||||
|
prints to stderr *and* appends a JSONL line to
|
||||||
|
`{cache.root}/flags.jsonl`. At the end of `_run_investigation()`
|
||||||
|
(`ai.py:1387–1397`), the orchestrator reads that file back and includes
|
||||||
|
the flags in its return tuple. `format_report()` then renders them in a
|
||||||
|
dedicated section.
|
||||||
|
|
||||||
|
Severity is `info | concern | critical`. The agent is told to flag
|
||||||
|
*immediately* on discovery, not save findings for the report — this is in
|
||||||
|
the tool description at `ai.py:312`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Where to make common changes
|
||||||
|
|
||||||
|
A cookbook for the kinds of changes that come up most often.
|
||||||
|
|
||||||
|
### 9.1 Add a new tool the dir agent can call
|
||||||
|
|
||||||
|
1. Write the implementation: `_tool_<name>(args, target, cache)` somewhere
|
||||||
|
in the tool implementations section of `ai.py` (~lines 486–642).
|
||||||
|
Return a string.
|
||||||
|
2. Add it to `_TOOL_DISPATCH` at `ai.py:645`.
|
||||||
|
3. Add its schema to `_DIR_TOOLS` at `ai.py:151`. The schema must follow
|
||||||
|
Anthropic tool-use shape: `name`, `description`, `input_schema`.
|
||||||
|
4. Decide whether the survey should be able to filter it out (default:
|
||||||
|
yes — leave it out of `_PROTECTED_DIR_TOOLS`) or whether it's
|
||||||
|
control-flow critical (add to `_PROTECTED_DIR_TOOLS`).
|
||||||
|
5. Update `_DIR_SYSTEM_PROMPT` in `prompts.py` if the agent needs
|
||||||
|
instructions on when to use the new tool.
|
||||||
|
6. There is no unit test for tool registration today (`ai.py` is exempt).
|
||||||
|
If you want coverage, the test would mock `client.messages.stream` and
|
||||||
|
assert that the dispatch table contains your tool.
|
||||||
|
|
||||||
|
### 9.2 Add a whole new pass
|
||||||
|
|
||||||
|
(Phase 3's planning pass is the immediate example.) The pattern:
|
||||||
|
|
||||||
|
1. Define a new system prompt constant in `prompts.py`
|
||||||
|
2. Define a new tool list in `ai.py` for the pass-specific submit tool
|
||||||
|
3. Write `_run_<pass>()` in `ai.py`, modeled on `_run_survey()` — single
|
||||||
|
submit tool, low max_turns, returns a dict or `None` on failure
|
||||||
|
4. Wire it into `_run_investigation()` between existing passes
|
||||||
|
5. Pass its output downstream by adding a kwarg to `_run_dir_loop()` (or
|
||||||
|
wherever it's needed) and threading it through
|
||||||
|
|
||||||
|
The survey pass is the cleanest reference implementation because it's
|
||||||
|
short and self-contained.
|
||||||
|
|
||||||
|
### 9.3 Change a prompt
|
||||||
|
|
||||||
|
Edit the constant in `prompts.py`. If you add a `{placeholder}`, also
|
||||||
|
update the corresponding `.format(...)` call in `ai.py`. Search the
|
||||||
|
codebase for the constant name to find the call site:
|
||||||
|
|
||||||
|
```
|
||||||
|
grep -n SURVEY_SYSTEM_PROMPT luminos_lib/ai.py
|
||||||
|
```
|
||||||
|
|
||||||
|
There is no prompt versioning today. Investigation cache entries don't
|
||||||
|
record which prompt version produced them, so re-running with a new
|
||||||
|
prompt against an existing investigation will mix old and new outputs
|
||||||
|
unless you `--fresh`.
|
||||||
|
|
||||||
|
### 9.4 Change cache schema
|
||||||
|
|
||||||
|
1. Update the required-fields set in `cache.py:write_entry()`
|
||||||
|
(`cache.py:119–123`)
|
||||||
|
2. Update `_DIR_TOOLS`'s `write_cache` description in `ai.py:228` so the
|
||||||
|
agent knows what to write
|
||||||
|
3. Update `_DIR_SYSTEM_PROMPT` in `prompts.py` if the agent needs to know
|
||||||
|
*how* to populate the new field
|
||||||
|
4. Update `tests/test_cache.py` — schema validation is the part of the
|
||||||
|
cache that *is* covered
|
||||||
|
|
||||||
|
### 9.5 Add a CLI flag
|
||||||
|
|
||||||
|
Edit `luminos.py:88` (`main()`'s argparse setup) to define the flag, then
|
||||||
|
plumb it through whatever functions need it. New AI-related flags
|
||||||
|
typically need to be added to `analyze_directory()`'s signature
|
||||||
|
(`ai.py:1408`) and then forwarded to `_run_investigation()`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Token budget and cost
|
||||||
|
|
||||||
|
Budget logic is in `_TokenTracker.budget_exceeded()` and is checked at the
|
||||||
|
top of every dir loop iteration (`ai.py:882`). The budget is **per call**,
|
||||||
|
not cumulative — see §4.4. The breach handler flushes a partial dir cache
|
||||||
|
entry so work isn't lost.
|
||||||
|
|
||||||
|
Cost reporting happens once at the end of `_run_investigation()`
|
||||||
|
(`ai.py:1399`), using the cumulative `total_input` and `total_output`
|
||||||
|
counters multiplied by the constants at `ai.py:43–44`. There is no
|
||||||
|
running cost display during the investigation today. If you want one,
|
||||||
|
`_TokenTracker.summary()` already returns the formatted string — just
|
||||||
|
call it after each dir loop.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. Glossary
|
||||||
|
|
||||||
|
| Term | Meaning |
|
||||||
|
|---|---|
|
||||||
|
| **base scan** | The non-AI phase: tree, classification, languages, recency, disk usage. Stdlib + coreutils only. |
|
||||||
|
| **dir loop** | Per-directory agent loop in `_run_dir_loop`. Up to 14 turns. Produces a `dir` cache entry. |
|
||||||
|
| **survey pass** | Single short loop before any dir loops, producing a shared description and tool guidance. |
|
||||||
|
| **synthesis pass** | Final loop that reads `dir` cache entries and produces `(brief, detailed)`. |
|
||||||
|
| **leaves-first** | Discovery order in `_discover_directories`: deepest paths first, so child summaries exist when parents are investigated. |
|
||||||
|
| **investigation** | One end-to-end run, identified by a UUID, persisted under `/tmp/luminos/{uuid}/`. |
|
||||||
|
| **investigation_id** | The UUID. Stored in `/tmp/luminos/investigations.json` keyed by absolute target path. |
|
||||||
|
| **cache entry** | A JSON file under `files/` or `dirs/` named by sha256(path). |
|
||||||
|
| **flag** | An agent finding written to `flags.jsonl` and reported separately. info / concern / critical. |
|
||||||
|
| **partial entry** | A `dir` cache entry written when the budget tripped before `submit_report`. Marked with `partial: True`. |
|
||||||
|
| **survey signals** | The histogram + samples computed by `filetypes.survey_signals()` during the base scan, fed to the survey prompt. |
|
||||||
|
| **last_input** | The `input_tokens` count from the most recent API call. The basis for budget checks. NOT the cumulative sum. |
|
||||||
|
| **CONTEXT_BUDGET** | 70% of 200k = 140k. Trigger threshold for early exit. |
|
||||||
|
| **`_PROTECTED_DIR_TOOLS`** | Tools the survey is forbidden from filtering out of the dir loop's toolbox. Currently `{submit_report}`. |
|
||||||
144
Roadmap.md
144
Roadmap.md
|
|
@ -1,125 +1,37 @@
|
||||||
# Roadmap
|
# Roadmap
|
||||||
|
|
||||||
Full design notes and open questions live in `PLAN.md` in the repo root.
|
The roadmap used to live here as a static phase list. It drifted out of
|
||||||
This page tracks phase status.
|
sync with reality (Phase 2 was marked "Not started" months after it
|
||||||
|
shipped) so it has been replaced with pointers to the two sources that
|
||||||
|
actually stay current.
|
||||||
|
|
||||||
---
|
## Where the roadmap lives now
|
||||||
|
|
||||||
## Core Philosophy
|
**Design and rationale** → [`PLAN.md`](https://forgejo.labbity.unbiasedgeek.com/archeious/luminos/src/branch/main/PLAN.md)
|
||||||
|
in the repo root. Phase descriptions, philosophy, file map, known
|
||||||
|
unknowns, concerns. This is the long-form *why*.
|
||||||
|
|
||||||
Move from a **pipeline with AI steps** to **investigation driven by curiosity**.
|
**Current status and active work** → [Open issues](https://forgejo.labbity.unbiasedgeek.com/archeious/luminos/issues)
|
||||||
The agent should decide what it needs to know and how to find it out — not
|
on Forgejo. Issues track what is in flight, what is done (closed), and
|
||||||
execute a predetermined checklist.
|
what is queued. Each phase corresponds to a set of issues; closed issues
|
||||||
|
are the ground truth for "is this shipped."
|
||||||
|
|
||||||
---
|
## Phase status at a glance
|
||||||
|
|
||||||
## Phases
|
| Phase | Topic | Status |
|
||||||
|
|
||||||
### Phase 1 — Confidence Tracking
|
|
||||||
Add `confidence` + `confidence_reason` to file and dir cache entries.
|
|
||||||
Agent sets this when writing cache. Enables later phases to prioritize
|
|
||||||
re-investigation of uncertain entries.
|
|
||||||
|
|
||||||
**Status:** Not started
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 2 — Survey Pass
|
|
||||||
Lightweight pre-investigation pass. Agent looks at file type distribution
|
|
||||||
and tree structure, then answers: what is this, how should I investigate it,
|
|
||||||
which tools are relevant?
|
|
||||||
|
|
||||||
Replaces hardcoded domain detection with AI-driven characterization.
|
|
||||||
Survey output injected into dir loop system prompts as context.
|
|
||||||
|
|
||||||
**Status:** Not started
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 3 — Investigation Planning
|
|
||||||
After survey, a planning pass allocates investigation depth per directory.
|
|
||||||
Replaces fixed max_turns-per-dir with a global turn budget the agent manages.
|
|
||||||
Priority dirs get more turns; trivial dirs get fewer; generated/vendored dirs
|
|
||||||
get skipped.
|
|
||||||
|
|
||||||
**Status:** Not started
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 4 — External Knowledge Tools
|
|
||||||
Resolution strategies for uncertainty beyond local files:
|
|
||||||
- `web_search` — unfamiliar library, format, API
|
|
||||||
- `package_lookup` — PyPI / npm / crates.io metadata
|
|
||||||
- `fetch_url` — follow URLs referenced in local files
|
|
||||||
- `ask_user` — interactive mode, last resort
|
|
||||||
|
|
||||||
All gated behind `--no-external` flag. Budget-limited per session.
|
|
||||||
|
|
||||||
**Status:** Not started
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 5 — Scale-Tiered Synthesis
|
|
||||||
Calibrate synthesis input and depth to target size:
|
|
||||||
|
|
||||||
| Tier | Size | Approach |
|
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| small | <5 dirs / <30 files | Per-file cache entries as synthesis input |
|
| 1 | Confidence tracking | ✅ shipped |
|
||||||
| medium | 5–30 dirs | Dir summaries (current) |
|
| 2 | Survey pass | ✅ shipped |
|
||||||
| large | 31–150 dirs | Multi-level synthesis |
|
| 2.5 | Context budget reliability (#44) | ✅ shipped |
|
||||||
| xlarge | >150 dirs | Multi-level + subsystem grouping |
|
| 3 | Investigation planning | ⏳ next |
|
||||||
|
| 3.5 | MCP backend abstraction (#39) | planned |
|
||||||
|
| 4 | External knowledge tools | planned |
|
||||||
|
| 4.5 | Unit of analysis (#48) | planned |
|
||||||
|
| 5 | Scale-tiered synthesis | planned |
|
||||||
|
| 6 | Multi-level synthesis | planned |
|
||||||
|
| 7 | Hypothesis-driven synthesis | planned |
|
||||||
|
| 8 | Refinement pass | planned |
|
||||||
|
| 9 | Dynamic report structure | planned |
|
||||||
|
|
||||||
**Status:** Not started
|
For details on any phase, read the matching section of `PLAN.md` and
|
||||||
|
search open issues for the phase number or feature name.
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 6 — Multi-Level Synthesis
|
|
||||||
For large/xlarge: grouping pass identifies logical subsystems from dir
|
|
||||||
summaries (not directory structure). Final synthesis receives 3–10 subsystem
|
|
||||||
summaries rather than hundreds of dir summaries.
|
|
||||||
|
|
||||||
**Status:** Not started
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 7 — Hypothesis-Driven Synthesis
|
|
||||||
Synthesis reframed from aggregation to conclusion-with-evidence. Agent
|
|
||||||
forms a hypothesis, looks for confirming/refuting evidence, considers
|
|
||||||
alternatives, then submits.
|
|
||||||
|
|
||||||
Produces analytical output rather than descriptive output.
|
|
||||||
|
|
||||||
**Status:** Not started
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 8 — Refinement Pass
|
|
||||||
Post-synthesis targeted re-investigation. Agent receives current synthesis,
|
|
||||||
identifies gaps and contradictions, goes back to actual files (or external
|
|
||||||
sources), submits improved report.
|
|
||||||
|
|
||||||
Triggered by `--refine` flag. `--refine-depth N` for multiple passes.
|
|
||||||
|
|
||||||
**Status:** Not started
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 9 — Dynamic Report Structure
|
|
||||||
Synthesis produces a superset of possible output fields; report formatter
|
|
||||||
renders only populated ones. Output naturally scales from minimal (small
|
|
||||||
simple targets) to comprehensive (large complex targets).
|
|
||||||
|
|
||||||
**Status:** Not started
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Open Design Questions
|
|
||||||
|
|
||||||
See `PLAN.md` — Known Unknowns and Concerns sections.
|
|
||||||
|
|
||||||
Key unresolved items:
|
|
||||||
- Which search API to use for web_search
|
|
||||||
- Whether external tools should be opt-in or opt-out by default
|
|
||||||
- How to handle confidence calibration (numeric vs categorical)
|
|
||||||
- Config file format and location for tunable thresholds
|
|
||||||
- Progressive output / interactive mode UX design
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue