luminos/README.md

134 lines
5.9 KiB
Markdown
Raw Permalink Normal View History

# Luminos
A file system intelligence tool. Scans a directory and produces a reconnaissance report that tells you what the directory is, what's in it, and what might be worth your attention.
Luminos has two modes. The **base mode** is a single Python file that uses only the standard library and GNU coreutils. No pip install, no virtual environment, no dependencies to audit. The **`--ai` mode** runs a multi-pass agentic investigation against the [Claude API](https://www.anthropic.com/api) to reason about what the project actually does and flag anything that looks off. AI mode is opt-in and is the only path that requires pip-installable packages.
## Why
Most "repo explorer" tools answer one question: "what files are here?" Luminos is built around a harder question: "what is this, and should I be worried about any of it?"
The base scan gives you the mechanical answer: directory tree, file classification across seven categories, language breakdown with line counts, recently modified files, disk usage, and the largest files. That is usually enough for a quick "what is this" look.
The AI mode goes further. It runs an isolated investigation per directory, leaves-first, with a small toolbelt (read files, run whitelisted coreutils commands, write cache entries) and a per-directory context budget. Each directory gets its own summary, and a final synthesis pass reads only the directory-level cache entries to produce a whole-project verdict. Findings are flagged with a severity level (`critical`, `concern`, or `info`) so the important stuff floats to the top.
## Features
- **Zero dependencies in base mode.** Runs on bare Python 3 plus the GNU coreutils you already have.
- **Graceful degradation everywhere.** Permission denied, subprocess timeouts, missing API key, missing optional packages: all handled without crashing the scan.
- **Directory tree.** Visual tree with configurable depth and exclude patterns.
- **File classification.** Files bucketed into seven categories (code, config, docs, data, media, binary, other) via `file(1)` magic.
- **Language detection and LOC counting.** Which languages are present, how many lines of code per language.
- **Recently modified files.** Surface the files most likely to reflect current activity.
- **Disk usage.** Per-directory disk usage with top offenders called out.
- **Watch mode.** Re-scan every 30 seconds and show diffs.
- **JSON output.** Pipe reports to other tools or save for comparison.
- **AI investigation (opt-in).** Multi-pass, leaves-first agentic analysis via Claude, with an investigation cache so repeat runs are cheap.
- **Severity-ranked flags.** Findings are sorted so `critical` items are the first thing you see.
## Installation
### Base mode
No installation required. Clone and run.
```bash
git clone https://github.com/archeious/luminos.git
cd luminos
python3 luminos.py <target>
```
Works on any system with Python 3 and standard GNU coreutils (`wc`, `file`, `grep`, `head`, `tail`, `stat`, `du`, `find`).
### AI mode
AI mode needs a few pip-installable packages. The project ships a helper script that creates a dedicated virtual environment and installs them:
```bash
./setup_env.sh
source ~/luminos-env/bin/activate
```
The packages installed are `anthropic`, `tree-sitter`, a handful of tree-sitter language grammars, and `python-magic`.
You also need an Anthropic API key exported as an environment variable:
```bash
export ANTHROPIC_API_KEY=your-key-here
```
Check which optional dependencies are present:
```bash
python3 luminos.py --install-extras
```
## Usage
### Base scan
```bash
python3 luminos.py /path/to/project
```
### AI scan
```bash
python3 luminos.py --ai /path/to/project
```
### Common flags
```bash
# Deeper tree, include hidden files, exclude build and vendor dirs
python3 luminos.py -d 8 -a -x .git -x node_modules -x vendor /path/to/project
# JSON output to a file
python3 luminos.py --json -o report.json /path/to/project
# Watch mode (re-scan every 30s, show diffs)
python3 luminos.py --watch /path/to/project
# Force a fresh AI investigation, ignoring the cache
python3 luminos.py --ai --fresh /path/to/project
# Clear the AI investigation cache
python3 luminos.py --clear-cache
```
Run `python3 luminos.py --help` for the full flag list.
## How AI mode works
A short version of what happens when you pass `--ai`:
1. **Discover** every directory under the target.
2. **Sort leaves-first** so the deepest directories are investigated before their parents.
3. **Run an isolated agent loop per directory** with a max of 10 turns each. The agent has a small toolbelt: read files, run whitelisted coreutils commands (`wc`, `file`, `grep`, `head`, `tail`, `stat`, `du`, `find`), and write cache entries.
4. **Cache everything.** Each file and directory summary is written to `/tmp/luminos/` so that subsequent runs on the same target don't burn tokens re-deriving things that haven't changed.
5. **Context budget guard.** Per-turn `input_tokens` is watched against a budget (currently 70% of the model's context window) so a rogue directory can't blow the context and silently degrade quality.
6. **Final synthesis pass** reads only the directory-level cache entries (not the raw files) to produce a project-level summary and the severity-ranked flags.
## Development
Run the test suite:
```bash
python3 -m unittest discover -s tests/
```
Modules that are intentionally not unit tested and why:
- `luminos_lib/ai.py`: requires a live Anthropic API, tested in practice
- `luminos_lib/ast_parser.py`: requires tree-sitter grammars installed
- `luminos_lib/watch.py`: stateful event loop, tested manually
- `luminos_lib/prompts.py`: string templates only
## License
Apache License 2.0. See [`LICENSE`](LICENSE) for the full text.
## Source of truth
The canonical home for this project is the [Forgejo repository](https://forgejo.labbity.unbiasedgeek.com/archeious/luminos). The GitHub copy is a read-only mirror, pushed automatically from Forgejo. Issues, pull requests, and the project wiki live on Forgejo.