diff --git a/Architecture.md b/Architecture.md index 8e538a3..c878aa1 100644 --- a/Architecture.md +++ b/Architecture.md @@ -1,5 +1,9 @@ # Architecture +> This page is the high-level map. For a code-level walkthrough with +> file:line references — how the dir loop actually works, where to add a +> tool, what the cache really stores — read [Internals](Internals). + ## Overview Luminos is a zero-dependency Python CLI at its base. The `--ai` flag layers an @@ -50,53 +54,93 @@ scan(target) ``` analyze_directory(report, target) │ - ├── _discover_directories() find all dirs, sort leaves-first - │ - ├── per-directory loop (each dir, up to max_turns=14) - │ _build_dir_context() list files + sizes - │ _get_child_summaries() read cached child summaries - │ _run_dir_loop() agent loop: read files, parse structure, - │ write cache entries, submit_report - │ Tools: read_file, list_directory, - │ run_command, parse_structure, - │ write_cache, think, checkpoint, - │ flag, submit_report - │ - ├── _run_synthesis() one-shot aggregation of dir summaries - │ reads all "dir" cache entries - │ produces brief (2-4 sentences) + detailed (free-form) - │ Tools: read_cache, list_cache, flag, submit_report - │ - └── returns (brief, detailed, flags) + └── _run_investigation() + │ + ├── _get_investigation_id() new UUID, or resume an existing one + │ + ├── _discover_directories() find all dirs, sort leaves-first + │ + ├── _run_survey() single short loop, max 3 turns + │ inputs: survey_signals + 2-level tree preview + │ Tools: submit_survey + │ output: shared description, approach, relevant_tools, + │ skip_tools, domain_notes, confidence + │ (skipped via _default_survey() on tiny targets) + │ + ├── _filter_dir_tools(survey) remove skip_tools (if confidence ≥ 0.5) + │ + ├── per-directory loop (each uncached dir, up to max_turns=14) + │ _build_dir_context() list files + sizes + MIME + │ _get_child_summaries() read cached child summaries + │ _format_survey_block() inject survey context into prompt + │ _run_dir_loop() agent loop with budget check on + │ every iteration; flushes a partial + │ cache entry on budget breach + │ Tools: read_file, list_directory, run_command, + │ parse_structure, write_cache, think, checkpoint, + │ flag, submit_report + │ + ├── _run_synthesis() single loop, max 5 turns + │ reads all "dir" cache entries + │ produces brief (2-4 sentences) + detailed (free-form) + │ Tools: read_cache, list_cache, flag, submit_report + │ fallback: _synthesize_from_cache() if out of turns + │ + └── returns (brief, detailed, flags) ``` +Token usage and the context budget are tracked by `_TokenTracker` in +`ai.py`. The budget check uses the *most recent* call's `input_tokens`, +not the cumulative sum across turns — see #44 and the +[Internals](Internals) page §4.4 for why. + --- ## Cache Location: `/tmp/luminos//` -Two entry types, both stored as JSONL: +Layout: -**File entries** (`files.jsonl`): ``` -{path, relative_path, size_bytes, category, summary, notable, - notable_reason, cached_at} +meta.json investigation metadata +files/.json one JSON file per cached file entry +dirs/.json one JSON file per cached directory entry +flags.jsonl JSONL — appended on every flag tool call +investigation.log JSONL — appended on every tool call ``` -**Dir entries** (`dirs.jsonl`): +File and dir entries are stored as one sha256-keyed JSON file per entry +(not as JSONL) so that `has_entry(path)` is an O(1) `os.path.exists()` +check rather than a file scan. Only `flags.jsonl` and `investigation.log` +are JSONL. + +**File entries** (`files/.json`): ``` -{path, relative_path, child_count, summary, dominant_category, - notable_files, cached_at} +{path, relative_path, size_bytes, category, summary, cached_at, + [confidence], [confidence_reason], [notable], [notable_reason]} ``` +**Dir entries** (`dirs/.json`): +``` +{path, relative_path, child_count, dominant_category, summary, cached_at, + [confidence], [confidence_reason], [notable_files], + [partial], [partial_reason]} +``` + +`partial: true` marks a dir entry written by the budget-breach early-exit +path — the agent didn't reach `submit_report` and the summary was +synthesized from already-cached file entries. + **Flags** (`flags.jsonl`): ``` {path, finding, severity} severity: info | concern | critical ``` -Cache is reused across runs for the same target. `--fresh` ignores it. -`--clear-cache` deletes it. +Investigation IDs are persisted in `/tmp/luminos/investigations.json` +keyed by absolute target path. Cache is reused across runs for the same +target. `--fresh` mints a new investigation ID. `--clear-cache` deletes +the entire cache root. --- diff --git a/DevelopmentGuide.md b/DevelopmentGuide.md index a82332f..857d21c 100644 --- a/DevelopmentGuide.md +++ b/DevelopmentGuide.md @@ -1,5 +1,10 @@ # Development Guide +> This page covers **how to set up, run, and test** Luminos. For a +> code-level walkthrough of how the AI pipeline actually works — the dir +> loop, the cache, the survey pass, where to add a tool — read +> [Internals](Internals). + ## Running Luminos ```bash diff --git a/Home.md b/Home.md index e52835f..99ef55e 100644 --- a/Home.md +++ b/Home.md @@ -21,8 +21,9 @@ of what the directory contains and why. | Page | Contents | |---|---| | [Architecture](Architecture) | Module breakdown, data flow, AI pipeline | -| [Development Guide](DevelopmentGuide) | Git workflow, naming conventions, commands | -| [Roadmap](Roadmap) | Planned phases and open design questions | +| [Internals](Internals) | Code-level tour: dir loop, cache, prompts, where to make changes | +| [Development Guide](DevelopmentGuide) | Setup, git workflow, testing, commands | +| [Roadmap](Roadmap) | Phase status — pointer to PLAN.md and open issues | | [Session Retrospectives](SessionRetrospectives) | Full session history | --- diff --git a/Internals.md b/Internals.md new file mode 100644 index 0000000..f16a7ba --- /dev/null +++ b/Internals.md @@ -0,0 +1,514 @@ +# Internals + +A code tour of how Luminos actually works. Read this after +[Development Guide](DevelopmentGuide) and [Architecture](Architecture). The +goal is that a developer who knows basic Python but has never built an +agent loop can finish this page and start making non-trivial changes. + +All file:line references are accurate as of the date this page was last +edited — verify with `git log` or by opening the file before relying on a +specific line number. + +--- + +## 1. The two layers + +Luminos has a hard internal split: + +| Layer | What it does | Imports | +|---|---|---| +| **Base scan** | Walks the directory, classifies files, counts lines, ranks recency, measures disk usage, prints a report. | stdlib only + GNU coreutils via subprocess. **No pip packages.** | +| **AI pipeline** (`--ai`) | Runs a multi-pass agent investigation via the Claude API on top of the base scan output. | `anthropic`, `tree-sitter`, `python-magic` — all imported lazily. | + +The split is enforced by lazy imports. `luminos.py:156` is the only place +that imports from `luminos_lib.ai`, and it sits inside `if args.ai:`. You +can grep the codebase to verify: nothing in the base scan modules imports +anything from `ai.py`, `ast_parser.py`, or `prompts.py`. This means +`python3 luminos.py /target` works on a stock Python 3 install with no +packages installed at all. + +When you change a base-scan module, the question to ask is: *does this +introduce a top-level import of anything outside stdlib?* If yes, you've +broken the constraint and the change must be rewritten. + +--- + +## 2. Base scan walkthrough + +Entry: `luminos.py:main()` parses args, then calls `scan(target, ...)` at +`luminos.py:45`. `scan()` is a flat sequence — it builds a `report` dict +by calling helpers from `luminos_lib/`, one per concern, in order: + +``` +scan(target) + build_tree() → report["tree"], report["tree_rendered"] + classify_files() → report["classified_files"] + summarize_categories() → report["file_categories"] + survey_signals() → report["survey_signals"] ← input to AI survey + detect_languages() → report["languages"], report["lines_of_code"] + find_large_files() → report["large_files"] + find_recent_files() → report["recent_files"] + get_disk_usage() → report["disk_usage"] + top_directories() → report["top_directories"] + return report +``` + +Each helper is independent. You could delete `find_recent_files()` and the +report would just be missing that field. The flow is procedural, not +event-driven, and there is no shared state object — everything passes +through the local `report` dict. + +The progress lines you see on stderr (`[scan] Counting lines... foo.py`) +come from `_progress()` in `luminos.py:23`, which returns an `on_file` +callback that the helpers call as they work. If you add a new helper that +walks files, plumb a progress callback through the same way for +consistency. + +After `scan()` returns, `main()` either runs the AI pipeline or jumps +straight to `format_report()` (`luminos_lib/report.py`) for terminal +output, or `json.dumps()` for JSON. The AI pipeline always runs *after* +the base scan because it needs `report["survey_signals"]` and +`report["file_categories"]` as inputs. + +--- + +## 3. AI pipeline walkthrough + +The AI pipeline is what makes Luminos interesting and is also where +almost all the complexity lives. Everything below happens inside +`luminos_lib/ai.py` (1438 lines as of writing), called from +`luminos.py:157` via `analyze_directory()`. + +### 3.1 The orchestrator + +`analyze_directory()` (`ai.py:1408`) is a thin wrapper that checks +dependencies, gets the API key, builds the Anthropic client, and calls +`_run_investigation()`. If anything fails it prints a warning and returns +empty strings — the rest of luminos keeps working. + +`_run_investigation()` (`ai.py:1286`) is the real entry point. Read this +function first if you want to understand the pipeline shape. It does six +things, in order: + +1. **Get/create an investigation ID and cache** (`ai.py:1289–1294`). + Investigation IDs let you resume a previous run; see §5 below. +2. **Discover all directories** under the target via + `_discover_directories()` (`ai.py:715`). Returns them sorted + *leaves-first* — the deepest paths come first. This matters because + each dir loop reads its child directories' summaries from cache, so + children must be investigated before parents. +3. **Run the survey pass** (`ai.py:1300–1334`) unless the target is below + the size thresholds at `ai.py:780–781`, in which case + `_default_survey()` returns a synthetic skip. +4. **Filter out cached directories** (`ai.py:1336–1349`). If you're + resuming an investigation, dirs that already have a `dir` cache entry + are skipped — only new ones get a fresh dir loop. +5. **Run a dir loop per remaining directory** (`ai.py:1351–1375`). This + is the heart of the system — see §4. +6. **Run the synthesis pass** (`ai.py:1382`) reading only `dir` cache + entries to produce `(brief, detailed)`. + +It also reads `flags.jsonl` from disk at the end (`ai.py:1387–1397`) and +returns `(brief, detailed, flags)` to `analyze_directory()`. + +### 3.2 The survey pass + +`_run_survey()` (`ai.py:1051`) is a short, single-purpose loop. It exists +to give the dir loops some shared context about what they're looking at +*as a whole* before any of them start. + +Inputs go into the system prompt (`_SURVEY_SYSTEM_PROMPT` in +`prompts.py`): +- `survey_signals` — extension histogram, `file --brief` outputs, filename + samples (built by `filetypes.survey_signals()` during the base scan) +- A 2-level tree preview from `build_tree(target, max_depth=2)` +- The list of tools the dir loop will have available + +The survey is allowed only `submit_survey` as a tool (`_SURVEY_TOOLS` at +`ai.py:356`). It runs at most 3 turns. The agent must call `submit_survey` +exactly once with six fields: + +```python +{ + "description": "plain language — what is this target", + "approach": "how the dir loops should investigate it", + "relevant_tools": ["read_file", "parse_structure", ...], + "skip_tools": ["parse_structure", ...], # for non-code targets + "domain_notes": "anything unusual the dir loops should know", + "confidence": 0.0–1.0, +} +``` + +The result is a Python dict that gets passed into every dir loop as +`survey=...`. If the survey fails (API error, ran out of turns), the dir +loops still run but with `survey=None` — the system degrades gracefully. + +### 3.3 How the survey shapes dir loops + +Two things happen with the survey output before each dir loop runs: + +**Survey block injection.** `_format_survey_block()` (`ai.py:803`) renders +the survey dict as a labeled text block, which gets `.format()`-injected +into the dir loop system prompt as `{survey_context}`. The dir agent sees +the description, approach, domain notes, and which tools it should lean on +or skip. + +**Tool filtering.** `_filter_dir_tools()` (`ai.py:824`) returns a copy of +`_DIR_TOOLS` with anything in `skip_tools` removed — but only if the +survey's confidence is at or above `_SURVEY_CONFIDENCE_THRESHOLD = 0.5` +(`ai.py:775`). Below that threshold the agent gets the full toolbox. The +control-flow tool `submit_report` is in `_PROTECTED_DIR_TOOLS` and can +never be filtered out — removing it would break loop termination. + +This is the only place in the codebase where the agent's available tools +change at runtime. If you add a new tool, decide whether it should be +protectable. + +--- + +## 4. The dir loop in depth + +`_run_dir_loop()` is at `ai.py:845`. This is a hand-written agent loop and +you should expect to read it several times before it clicks. The shape is: + +``` +build system prompt (with survey context, child summaries, dir contents) +build initial user message ("investigate this directory now") +reset per-loop token counter +for turn in range(max_turns): # max_turns = 14 + if budget exceeded: flush partial cache and break + call API (streaming) + record token usage + print text blocks and tool decisions to stderr + append assistant response to message history + if no tool calls: nudge agent to call submit_report; continue + execute each tool call, build tool_result blocks + append tool_results to message history as user message + if submit_report was called: break +return summary +``` + +A few non-obvious mechanics: + +### 4.1 The message history grows monotonically + +Every turn appends an assistant message (the model's response) and a user +message (the tool results). Nothing is ever evicted. This means +`input_tokens` on each successive API call grows roughly linearly — the +model is re-sent the full conversation every turn. On code targets we see +~1.5–2k tokens added per turn. At `max_turns=14` this stays under the +budget; raising the cap would expose this. See **#51**. + +### 4.2 Tool dispatch + +Tools are not class methods. They're plain functions in `ai.py:486–642`, +registered into `_TOOL_DISPATCH` at `ai.py:645`. `_execute_tool()` +(`ai.py:659`) is a 16-line function that looks up the handler by name, +calls it, logs the turn to `investigation.log`, and returns the result +string. **The two control-flow tools — `submit_report` and `think`/ +`checkpoint` for narration — are NOT in `_TOOL_DISPATCH`** because the +loop body handles them specially: +- `submit_report` is recognized in the tool-use scan at `ai.py:977`, sets + `done = True`, and doesn't go through dispatch +- `think`, `checkpoint`, and `flag` *are* in dispatch, but they have side + effects that just print to stderr or append to `flags.jsonl` — the + return value is always `"ok"` + +When you add a tool: write the function, add it to `_TOOL_DISPATCH`, add +its schema to `_DIR_TOOLS`. That's it. + +### 4.3 Pre-loaded context + +Before the loop starts, two helpers prepare static context that goes into +the system prompt: + +- `_build_dir_context()` (`ai.py:736`) — `ls`-style listing of the dir + with sizes and MIME types via `python-magic`. The agent sees this + *before* it makes any tool calls, so it doesn't waste a turn just + listing the directory. +- `_get_child_summaries()` (`ai.py:758`) — looks up each subdirectory in + the cache and pulls its `summary` field. This is how leaves-first + ordering pays off: by the time the loop runs on `src/`, all of + `src/auth/`, `src/db/`, `src/middleware/` already have cached summaries + that get injected as `{child_summaries}`. + +If `_get_child_summaries()` returns nothing, the prompt says +`(none — this is a leaf directory)`. + +### 4.4 The token tracker and the budget check + +`_TokenTracker` (`ai.py:94`) is a tiny accumulator with one important +subtlety, captured in **#44**: + +> Cumulative input tokens are NOT a meaningful proxy for context size: +> each turn's `input_tokens` already includes the full message history, +> so summing across turns double-counts everything. Use `last_input` for +> budget decisions, totals for billing. + +So `budget_exceeded()` (`ai.py:135`) compares `last_input` (the most +recent call's input_tokens) to `CONTEXT_BUDGET` (`ai.py:40`), which is +70% of 200k. This is checked at the *top* of each loop iteration, before +the next API call. + +When the budget check trips, the loop: +1. Prints a `Context budget reached` warning to stderr +2. If no `dir` cache entry exists yet, builds a *partial* one from any + `file` cache entries the agent already wrote (`ai.py:889–937`), marks + it with `partial: True` and `partial_reason`, and writes it +3. Breaks out of the loop + +This means a budget breach doesn't lose work — anything the agent already +cached survives, and the synthesis pass will see a partial dir summary +rather than nothing. + +### 4.5 What the loop returns + +`_run_dir_loop()` returns the `summary` string from `submit_report` (or +the partial summary if the budget tripped). `_run_investigation()` then +writes a normal `dir` cache entry from this summary at `ai.py:1363–1375` +— *unless* the dir loop already wrote one itself via the partial-flush +path, in which case the `cache.has_entry("dir", dir_path)` check skips it. + +### 4.6 The streaming API caller + +`_call_api_streaming()` (`ai.py:681`) is a thin wrapper around +`client.messages.stream()`. It currently doesn't print tokens as they +arrive — it iterates the stream, drops everything, then pulls the final +message via `stream.get_final_message()`. The streaming API is used for +real-time tool decision printing, which today happens only after the full +response arrives. There's room here to add live progress printing if you +want it. + +--- + +## 5. The cache model + +Cache lives at `/tmp/luminos/{investigation_id}/`. Code is +`luminos_lib/cache.py` (201 lines). + +### 5.1 Investigation IDs + +`/tmp/luminos/investigations.json` maps absolute target paths to UUIDs. +`_get_investigation_id()` (`cache.py:40`) looks up the target and either +returns the existing UUID (resume) or creates a new one (fresh run). +`--fresh` forces a new UUID even if one exists. + +### 5.2 What's stored + +Inside `/tmp/luminos/{uuid}/`: + +``` +meta.json investigation metadata (model, start time, dir count) +files/.json one file per cached file entry +dirs/.json one file per cached directory entry +flags.jsonl JSONL — appended on every flag tool call +investigation.log JSONL — appended on every tool call +``` + +**File and dir cache entries are NOT in JSONL** — they are one +sha256-keyed JSON file per entry. The sha256 is over the path string +(`cache.py:13`). Only `flags.jsonl` and `investigation.log` use JSONL. + +Required fields are validated in `write_entry()` (`cache.py:115`): + +```python +file: {path, relative_path, size_bytes, category, summary, cached_at} +dir: {path, relative_path, child_count, dominant_category, summary, cached_at} +``` + +The validator also rejects entries containing `content`, `contents`, or +`raw` fields — the agent is explicitly forbidden from caching raw file +contents, summaries only. If you change the schema, update the required +set in `write_entry()` and update the test in `tests/test_cache.py`. + +### 5.3 Confidence support already exists + +`write_entry()` validates an optional `confidence` field +(`cache.py:129–134`) and a `confidence_reason` string. +`low_confidence_entries(threshold=0.7)` (`cache.py:191`) returns all +entries below a threshold, sorted ascending. The agent doesn't currently +*set* these fields in any prompt — that lights up when Phase 1 work +actually wires the prompts. + +### 5.4 Why one-file-per-entry instead of JSONL + +Random access by path. The dir loop calls `cache.has_entry("dir", path)` +once per directory during the `_get_child_summaries()` lookup; with +sha256-keyed files this is an `os.path.exists()` call. With JSONL it +would be a full file scan. + +--- + +## 6. Prompts + +All prompt templates live in `luminos_lib/prompts.py`. There are three: + +| Constant | Used by | What it carries | +|---|---|---| +| `_SURVEY_SYSTEM_PROMPT` | `_run_survey` | survey_signals, tree_preview, available_tools | +| `_DIR_SYSTEM_PROMPT` | `_run_dir_loop` | dir_path, dir_rel, max_turns, context, child_summaries, survey_context | +| `_SYNTHESIS_SYSTEM_PROMPT` | `_run_synthesis` | target, summaries_text | + +Each is a Python f-string-style template with `{name}` placeholders. The +caller assembles values and passes them to `.format(...)` immediately +before the API call. There is no template engine — it's plain string +formatting. + +When you change a prompt, the only thing you need to keep in sync is the +set of placeholders. If you add `{foo}` to the template, the caller must +provide `foo=...`. If you remove a placeholder from the template but +leave the kwarg in the caller, `.format()` silently ignores it. If you +add a placeholder and forget to provide it, `.format()` raises `KeyError` +at runtime. + +`prompts.py` has no logic and no tests — it's listed in +[Development Guide](DevelopmentGuide) as exempt from unit testing for +that reason. + +--- + +## 7. Synthesis pass + +`_run_synthesis()` (`ai.py:1157`) is structurally similar to the dir loop +but much simpler: + +- Reads all `dir` cache entries via `cache.read_all_entries("dir")` +- Renders them into a `summaries_text` block (one section per dir) +- Stuffs that into `_SYNTHESIS_SYSTEM_PROMPT` +- Loops up to `max_turns=5` waiting for `submit_report` with `brief` and + `detailed` fields + +Tools available: `read_cache`, `list_cache`, `flag`, `submit_report` +(`_SYNTHESIS_TOOLS` at `ai.py:401`). The synthesis agent can pull +specific cache entries back if it needs to drill in, but it cannot read +files directly — synthesis is meant to operate on summaries, not raw +contents. + +There's a fallback: if synthesis runs out of turns without calling +`submit_report`, `_synthesize_from_cache()` (`ai.py:1262`) builds a +mechanical brief+detailed from the cached dir summaries with no AI call. +This guarantees you always get *something* in the report. + +--- + +## 8. Flags + +The `flag` tool is the agent's pressure valve for "I noticed something +that should not be lost in the summary." `_tool_flag()` (`ai.py:629`) +prints to stderr *and* appends a JSONL line to +`{cache.root}/flags.jsonl`. At the end of `_run_investigation()` +(`ai.py:1387–1397`), the orchestrator reads that file back and includes +the flags in its return tuple. `format_report()` then renders them in a +dedicated section. + +Severity is `info | concern | critical`. The agent is told to flag +*immediately* on discovery, not save findings for the report — this is in +the tool description at `ai.py:312`. + +--- + +## 9. Where to make common changes + +A cookbook for the kinds of changes that come up most often. + +### 9.1 Add a new tool the dir agent can call + +1. Write the implementation: `_tool_(args, target, cache)` somewhere + in the tool implementations section of `ai.py` (~lines 486–642). + Return a string. +2. Add it to `_TOOL_DISPATCH` at `ai.py:645`. +3. Add its schema to `_DIR_TOOLS` at `ai.py:151`. The schema must follow + Anthropic tool-use shape: `name`, `description`, `input_schema`. +4. Decide whether the survey should be able to filter it out (default: + yes — leave it out of `_PROTECTED_DIR_TOOLS`) or whether it's + control-flow critical (add to `_PROTECTED_DIR_TOOLS`). +5. Update `_DIR_SYSTEM_PROMPT` in `prompts.py` if the agent needs + instructions on when to use the new tool. +6. There is no unit test for tool registration today (`ai.py` is exempt). + If you want coverage, the test would mock `client.messages.stream` and + assert that the dispatch table contains your tool. + +### 9.2 Add a whole new pass + +(Phase 3's planning pass is the immediate example.) The pattern: + +1. Define a new system prompt constant in `prompts.py` +2. Define a new tool list in `ai.py` for the pass-specific submit tool +3. Write `_run_()` in `ai.py`, modeled on `_run_survey()` — single + submit tool, low max_turns, returns a dict or `None` on failure +4. Wire it into `_run_investigation()` between existing passes +5. Pass its output downstream by adding a kwarg to `_run_dir_loop()` (or + wherever it's needed) and threading it through + +The survey pass is the cleanest reference implementation because it's +short and self-contained. + +### 9.3 Change a prompt + +Edit the constant in `prompts.py`. If you add a `{placeholder}`, also +update the corresponding `.format(...)` call in `ai.py`. Search the +codebase for the constant name to find the call site: + +``` +grep -n SURVEY_SYSTEM_PROMPT luminos_lib/ai.py +``` + +There is no prompt versioning today. Investigation cache entries don't +record which prompt version produced them, so re-running with a new +prompt against an existing investigation will mix old and new outputs +unless you `--fresh`. + +### 9.4 Change cache schema + +1. Update the required-fields set in `cache.py:write_entry()` + (`cache.py:119–123`) +2. Update `_DIR_TOOLS`'s `write_cache` description in `ai.py:228` so the + agent knows what to write +3. Update `_DIR_SYSTEM_PROMPT` in `prompts.py` if the agent needs to know + *how* to populate the new field +4. Update `tests/test_cache.py` — schema validation is the part of the + cache that *is* covered + +### 9.5 Add a CLI flag + +Edit `luminos.py:88` (`main()`'s argparse setup) to define the flag, then +plumb it through whatever functions need it. New AI-related flags +typically need to be added to `analyze_directory()`'s signature +(`ai.py:1408`) and then forwarded to `_run_investigation()`. + +--- + +## 10. Token budget and cost + +Budget logic is in `_TokenTracker.budget_exceeded()` and is checked at the +top of every dir loop iteration (`ai.py:882`). The budget is **per call**, +not cumulative — see §4.4. The breach handler flushes a partial dir cache +entry so work isn't lost. + +Cost reporting happens once at the end of `_run_investigation()` +(`ai.py:1399`), using the cumulative `total_input` and `total_output` +counters multiplied by the constants at `ai.py:43–44`. There is no +running cost display during the investigation today. If you want one, +`_TokenTracker.summary()` already returns the formatted string — just +call it after each dir loop. + +--- + +## 11. Glossary + +| Term | Meaning | +|---|---| +| **base scan** | The non-AI phase: tree, classification, languages, recency, disk usage. Stdlib + coreutils only. | +| **dir loop** | Per-directory agent loop in `_run_dir_loop`. Up to 14 turns. Produces a `dir` cache entry. | +| **survey pass** | Single short loop before any dir loops, producing a shared description and tool guidance. | +| **synthesis pass** | Final loop that reads `dir` cache entries and produces `(brief, detailed)`. | +| **leaves-first** | Discovery order in `_discover_directories`: deepest paths first, so child summaries exist when parents are investigated. | +| **investigation** | One end-to-end run, identified by a UUID, persisted under `/tmp/luminos/{uuid}/`. | +| **investigation_id** | The UUID. Stored in `/tmp/luminos/investigations.json` keyed by absolute target path. | +| **cache entry** | A JSON file under `files/` or `dirs/` named by sha256(path). | +| **flag** | An agent finding written to `flags.jsonl` and reported separately. info / concern / critical. | +| **partial entry** | A `dir` cache entry written when the budget tripped before `submit_report`. Marked with `partial: True`. | +| **survey signals** | The histogram + samples computed by `filetypes.survey_signals()` during the base scan, fed to the survey prompt. | +| **last_input** | The `input_tokens` count from the most recent API call. The basis for budget checks. NOT the cumulative sum. | +| **CONTEXT_BUDGET** | 70% of 200k = 140k. Trigger threshold for early exit. | +| **`_PROTECTED_DIR_TOOLS`** | Tools the survey is forbidden from filtering out of the dir loop's toolbox. Currently `{submit_report}`. | diff --git a/Roadmap.md b/Roadmap.md index 4456cbc..c0337ff 100644 --- a/Roadmap.md +++ b/Roadmap.md @@ -1,125 +1,37 @@ # Roadmap -Full design notes and open questions live in `PLAN.md` in the repo root. -This page tracks phase status. +The roadmap used to live here as a static phase list. It drifted out of +sync with reality (Phase 2 was marked "Not started" months after it +shipped) so it has been replaced with pointers to the two sources that +actually stay current. ---- +## Where the roadmap lives now -## Core Philosophy +**Design and rationale** → [`PLAN.md`](https://forgejo.labbity.unbiasedgeek.com/archeious/luminos/src/branch/main/PLAN.md) +in the repo root. Phase descriptions, philosophy, file map, known +unknowns, concerns. This is the long-form *why*. -Move from a **pipeline with AI steps** to **investigation driven by curiosity**. -The agent should decide what it needs to know and how to find it out — not -execute a predetermined checklist. +**Current status and active work** → [Open issues](https://forgejo.labbity.unbiasedgeek.com/archeious/luminos/issues) +on Forgejo. Issues track what is in flight, what is done (closed), and +what is queued. Each phase corresponds to a set of issues; closed issues +are the ground truth for "is this shipped." ---- +## Phase status at a glance -## Phases - -### Phase 1 — Confidence Tracking -Add `confidence` + `confidence_reason` to file and dir cache entries. -Agent sets this when writing cache. Enables later phases to prioritize -re-investigation of uncertain entries. - -**Status:** Not started - ---- - -### Phase 2 — Survey Pass -Lightweight pre-investigation pass. Agent looks at file type distribution -and tree structure, then answers: what is this, how should I investigate it, -which tools are relevant? - -Replaces hardcoded domain detection with AI-driven characterization. -Survey output injected into dir loop system prompts as context. - -**Status:** Not started - ---- - -### Phase 3 — Investigation Planning -After survey, a planning pass allocates investigation depth per directory. -Replaces fixed max_turns-per-dir with a global turn budget the agent manages. -Priority dirs get more turns; trivial dirs get fewer; generated/vendored dirs -get skipped. - -**Status:** Not started - ---- - -### Phase 4 — External Knowledge Tools -Resolution strategies for uncertainty beyond local files: -- `web_search` — unfamiliar library, format, API -- `package_lookup` — PyPI / npm / crates.io metadata -- `fetch_url` — follow URLs referenced in local files -- `ask_user` — interactive mode, last resort - -All gated behind `--no-external` flag. Budget-limited per session. - -**Status:** Not started - ---- - -### Phase 5 — Scale-Tiered Synthesis -Calibrate synthesis input and depth to target size: - -| Tier | Size | Approach | +| Phase | Topic | Status | |---|---|---| -| small | <5 dirs / <30 files | Per-file cache entries as synthesis input | -| medium | 5–30 dirs | Dir summaries (current) | -| large | 31–150 dirs | Multi-level synthesis | -| xlarge | >150 dirs | Multi-level + subsystem grouping | +| 1 | Confidence tracking | ✅ shipped | +| 2 | Survey pass | ✅ shipped | +| 2.5 | Context budget reliability (#44) | ✅ shipped | +| 3 | Investigation planning | ⏳ next | +| 3.5 | MCP backend abstraction (#39) | planned | +| 4 | External knowledge tools | planned | +| 4.5 | Unit of analysis (#48) | planned | +| 5 | Scale-tiered synthesis | planned | +| 6 | Multi-level synthesis | planned | +| 7 | Hypothesis-driven synthesis | planned | +| 8 | Refinement pass | planned | +| 9 | Dynamic report structure | planned | -**Status:** Not started - ---- - -### Phase 6 — Multi-Level Synthesis -For large/xlarge: grouping pass identifies logical subsystems from dir -summaries (not directory structure). Final synthesis receives 3–10 subsystem -summaries rather than hundreds of dir summaries. - -**Status:** Not started - ---- - -### Phase 7 — Hypothesis-Driven Synthesis -Synthesis reframed from aggregation to conclusion-with-evidence. Agent -forms a hypothesis, looks for confirming/refuting evidence, considers -alternatives, then submits. - -Produces analytical output rather than descriptive output. - -**Status:** Not started - ---- - -### Phase 8 — Refinement Pass -Post-synthesis targeted re-investigation. Agent receives current synthesis, -identifies gaps and contradictions, goes back to actual files (or external -sources), submits improved report. - -Triggered by `--refine` flag. `--refine-depth N` for multiple passes. - -**Status:** Not started - ---- - -### Phase 9 — Dynamic Report Structure -Synthesis produces a superset of possible output fields; report formatter -renders only populated ones. Output naturally scales from minimal (small -simple targets) to comprehensive (large complex targets). - -**Status:** Not started - ---- - -## Open Design Questions - -See `PLAN.md` — Known Unknowns and Concerns sections. - -Key unresolved items: -- Which search API to use for web_search -- Whether external tools should be opt-in or opt-out by default -- How to handle confidence calibration (numeric vs categorical) -- Config file format and location for tunable thresholds -- Progressive output / interactive mode UX design +For details on any phase, read the matching section of `PLAN.md` and +search open issues for the phase number or feature name.