luminos/README.md
Jeff Smith a6333858ee test(ai): add unit coverage for pure helpers in ai.py (#55)
ai.py was documented as fully exempt from unit testing because the dir
loop and synthesis pass require a live Anthropic API. But several
helpers in the module are pure functions with no API dependency, and
they're the kind of thing that breaks silently. The #57 refactor added
two more (_build_dir_loop_context, _flush_partial_dir_entry) that are
also naturally testable.

New tests/test_ai_pure.py — 45 tests across 8 helpers:

- _should_skip_dir: exact-match, *.egg-info glob, no-match cases
- _path_is_safe: inside, nested, equals, outside, traversal,
  sibling-with-target-prefix (the easy-to-miss security case)
- _default_survey: shape, zero confidence guarantees no filtering,
  passes through _filter_dir_tools unchanged
- _format_survey_block: None, empty, minimal, with relevant_tools,
  with skip_tools, with domain_notes, empty-list omission
- _filter_dir_tools: None, empty, low confidence, high confidence
  filters, protected tools never removed, unknown skip silently
  ignored, garbage/None confidence treated as zero, threshold
  boundary inclusive
- _format_survey_signals: None, empty, zero total_files, full,
  partial (only extensions)
- _block_to_dict: text, tool_use, unknown type
- _flush_partial_dir_entry (#57): idempotent when entry exists,
  no-file-entries stub path, with-file-entries summary synthesis,
  notable_files collection

Uses the same _make_manager() pattern as test_cache.py to construct
a _CacheManager rooted in a tempdir, sidestepping CACHE_ROOT entirely.

Doc updates:
- CLAUDE.md, README.md, docs/wiki/DevelopmentGuide.md: ai.py is no
  longer fully exempt — only the API-dependent loops are. Pure
  helpers are covered by test_ai_pure.py.

Verification: 209 tests pass (164 prior + 45 new).
2026-04-11 10:24:47 -06:00

103 lines
5 KiB
Markdown

# Luminos
A file system intelligence tool. Point it at a directory and it runs an agentic Claude investigation that figures out what the directory is, what's in it, and what might be worth your attention.
Luminos is built around a harder question than "what files are here?" It is built around "what is this, and should I be worried about any of it?" To answer that, it runs a multi-pass agentic investigation against the [Claude API](https://www.anthropic.com/api): a survey pass to orient on the target, an isolated dir-loop agent per directory with a small toolbelt (read files, run whitelisted coreutils commands, write cache entries), and a final synthesis pass that produces a project-level verdict with severity-ranked flags.
A lightweight base scan runs first to feed the agent its initial picture of the target. The base scan is not a standalone product, it is the first step of the investigation.
## Features
- **Agentic AI investigation.** Multi-pass, leaves-first analysis via Claude. Survey then dir loops then synthesis.
- **Investigation cache.** Per-file and per-directory summaries are cached under `/tmp/luminos/` so repeat runs on the same target are cheap.
- **Severity-ranked flags.** Findings are sorted so `critical` items are the first thing you see.
- **Context budget guard.** Per-turn `input_tokens` is watched against a budget so a rogue directory can't blow the context and silently degrade quality.
- **Graceful degradation.** Permission denied, subprocess timeouts, missing API key: all handled without crashing.
- **JSON output.** Pipe reports to other tools or save for comparison.
## Installation
Luminos is a normal Python project. Clone, create a venv, and install from `requirements.txt`. The repository ships a helper script that does this for you:
```bash
git clone https://github.com/archeious/luminos.git
cd luminos
./setup_env.sh
source ~/luminos-env/bin/activate
```
Or do it by hand:
```bash
python3 -m venv ~/luminos-env
source ~/luminos-env/bin/activate
pip install -r requirements.txt
```
You also need an Anthropic API key exported as an environment variable:
```bash
export ANTHROPIC_API_KEY=your-key-here
```
The base scan shells out to a handful of GNU coreutils (`wc`, `file`, `grep`, `head`, `tail`, `stat`, `du`, `find`), so you also need those on `$PATH`. They are installed by default on every mainstream Linux distribution and on macOS via Homebrew.
## Usage
```bash
python3 luminos.py /path/to/project
```
That is the whole interface. The investigation runs end to end and prints a report.
### Common flags
```bash
# Deeper tree, include hidden files, exclude build and vendor dirs
python3 luminos.py -d 8 -a -x .git -x node_modules -x vendor /path/to/project
# JSON output to a file
python3 luminos.py --json -o report.json /path/to/project
# Force a fresh investigation, ignoring the cache
python3 luminos.py --fresh /path/to/project
# Clear the investigation cache
python3 luminos.py --clear-cache
```
Run `python3 luminos.py --help` for the full flag list.
## How the investigation works
A short version of what happens on every run:
1. **Base scan.** Builds the directory tree, classifies files into seven categories, counts lines of code, finds large and recently modified files, computes per-directory disk usage. This is the agent's initial picture of the target.
2. **Survey pass.** A short agent loop (max 3 turns) reads the base scan, describes the target in plain language, and decides which investigation tools are relevant. Tiny targets skip the survey.
3. **Dir loops.** Every directory gets its own isolated agent loop, leaves-first, with up to 14 turns. The agent has read-only access to the filesystem and a toolbelt of `read_file`, `list_directory`, `run_command`, `parse_structure`, `write_cache`, `think`, `checkpoint`, `flag`, and `submit_report`.
4. **Cache.** Each file and directory summary is written to `/tmp/luminos/` so subsequent runs on the same target don't re-derive what hasn't changed.
5. **Context budget guard.** Per-turn `input_tokens` is watched against a budget (currently 70% of the model's context window) so a rogue directory can't blow the context window.
6. **Final synthesis.** A short agent loop reads the directory-level cache entries (not the raw files) and produces the project-level brief, the detailed analysis, and the severity-ranked flags.
## Development
Run the test suite:
```bash
python3 -m unittest discover -s tests/
```
Modules that are intentionally not unit tested:
- `luminos_lib/ast_parser.py`: requires tree-sitter grammars installed
- `luminos_lib/prompts.py`: string templates only
`luminos_lib/ai.py` is partially covered. End-to-end agent loops require a live Anthropic API and stay exempt, but pure helpers are tested in `tests/test_ai_pure.py`.
## License
Apache License 2.0. See [`LICENSE`](LICENSE) for the full text.
## Source of truth
The canonical home for this project is the [Forgejo repository](https://forgejo.labbity.unbiasedgeek.com/archeious/luminos). The GitHub copy is a read-only mirror, pushed automatically from Forgejo. Issues, pull requests, and the project wiki live on Forgejo.