Merge pull request 'feat: AI investigation is the product, drop zero-dep constraint (#64)' (#65) from feat/issue-64-ai-first-scope into main

This commit is contained in:
claude-code 2026-04-11 15:46:46 +00:00
commit 5c5c4dbb1a
11 changed files with 114 additions and 432 deletions

View file

@ -19,9 +19,11 @@
## Project Overview ## Project Overview
Luminos is a file system intelligence tool — a zero-dependency Python CLI that Luminos is a file system intelligence tool. Point it at a directory and it
scans a directory and produces a reconnaissance report. With `--ai` it runs a runs a multi-pass agentic investigation via the Claude API: a survey pass,
multi-pass agentic investigation via the Claude API. isolated dir-loop agents per directory, and a synthesis pass that produces a
project-level verdict with severity-ranked flags. A lightweight base scan
runs first to feed the agent its initial picture of the target.
--- ---
@ -32,8 +34,7 @@ multi-pass agentic investigation via the Claude API.
| `luminos.py` | Entry point — arg parsing, scan(), main() | | `luminos.py` | Entry point — arg parsing, scan(), main() |
| `luminos_lib/ai.py` | Multi-pass agentic analysis via Claude API | | `luminos_lib/ai.py` | Multi-pass agentic analysis via Claude API |
| `luminos_lib/ast_parser.py` | tree-sitter code structure parsing | | `luminos_lib/ast_parser.py` | tree-sitter code structure parsing |
| `luminos_lib/cache.py` | Investigation cache management | | `luminos_lib/cache.py` | Investigation cache management (incl. clear_cache) |
| `luminos_lib/capabilities.py` | Optional dep detection, cache cleanup |
| `luminos_lib/code.py` | Language detection, LOC counting | | `luminos_lib/code.py` | Language detection, LOC counting |
| `luminos_lib/disk.py` | Per-directory disk usage | | `luminos_lib/disk.py` | Per-directory disk usage |
| `luminos_lib/filetypes.py` | File classification (7 categories) | | `luminos_lib/filetypes.py` | File classification (7 categories) |
@ -41,7 +42,6 @@ multi-pass agentic investigation via the Claude API.
| `luminos_lib/recency.py` | Recently modified files | | `luminos_lib/recency.py` | Recently modified files |
| `luminos_lib/report.py` | Terminal report formatter | | `luminos_lib/report.py` | Terminal report formatter |
| `luminos_lib/tree.py` | Directory tree visualization | | `luminos_lib/tree.py` | Directory tree visualization |
| `luminos_lib/watch.py` | Watch mode with snapshot diffing |
Details: wiki — [Architecture](https://forgejo.labbity.unbiasedgeek.com/archeious/luminos/wiki/Architecture) | [Development Guide](https://forgejo.labbity.unbiasedgeek.com/archeious/luminos/wiki/DevelopmentGuide) Details: wiki — [Architecture](https://forgejo.labbity.unbiasedgeek.com/archeious/luminos/wiki/Architecture) | [Development Guide](https://forgejo.labbity.unbiasedgeek.com/archeious/luminos/wiki/DevelopmentGuide)
@ -49,32 +49,36 @@ Details: wiki — [Architecture](https://forgejo.labbity.unbiasedgeek.com/archei
## Key Constraints ## Key Constraints
- **Base tool: no pip dependencies.** tree, filetypes, code, disk, recency, - **AI investigation is the product.** The base scan exists to feed the agent.
report, watch use only stdlib and GNU coreutils. Must always work on bare Python 3. There is no `--ai` flag and no `--no-ai` mode. AI runs unconditionally on
- **AI deps are lazy.** `anthropic`, `tree-sitter`, `python-magic` imported only every invocation.
when `--ai` is used. Missing packages produce a clear install error. - **Anthropic API key is required.** If `ANTHROPIC_API_KEY` is unset, luminos
exits cleanly (exit 0) with a one-line hint instead of running.
- **Dependencies installed via `requirements.txt`.** anthropic, tree-sitter +
grammars, and python-magic are normal pip dependencies, not lazy imports.
`setup_env.sh` creates a venv and installs them.
- **Subprocess for OS tools.** LOC counting, file detection, disk usage, and - **Subprocess for OS tools.** LOC counting, file detection, disk usage, and
recency shell out to GNU coreutils. Do not reimplement in pure Python. recency shell out to GNU coreutils. Do not reimplement in pure Python.
- **Graceful degradation everywhere.** Permission denied, subprocess timeouts, - **Graceful degradation everywhere.** Permission denied, subprocess timeouts,
missing API key — all handled without crashing. individual dir-loop failures — all handled without crashing the run.
--- ---
## Running Luminos ## Running Luminos
```bash ```bash
# Base scan # Activate the venv (one-time setup: ./setup_env.sh)
python3 luminos.py <target>
# With AI analysis (requires ANTHROPIC_API_KEY)
source ~/luminos-env/bin/activate source ~/luminos-env/bin/activate
python3 luminos.py --ai <target> export ANTHROPIC_API_KEY=your-key-here
# Run an investigation
python3 luminos.py <target>
# Common flags # Common flags
python3 luminos.py -d 8 -a -x .git -x node_modules <target> python3 luminos.py -d 8 -a -x .git -x node_modules <target>
python3 luminos.py --json -o report.json <target> python3 luminos.py --json -o report.json <target>
python3 luminos.py --watch <target> python3 luminos.py --fresh <target>
python3 luminos.py --install-extras python3 luminos.py --clear-cache
``` ```
--- ---
@ -83,8 +87,7 @@ python3 luminos.py --install-extras
Run tests with `python3 -m unittest discover -s tests/`. Modules exempt from Run tests with `python3 -m unittest discover -s tests/`. Modules exempt from
unit testing: `ai.py` (requires live API), `ast_parser.py` (requires unit testing: `ai.py` (requires live API), `ast_parser.py` (requires
tree-sitter), `watch.py` (stateful events), `prompts.py` (string templates tree-sitter grammars at import time), `prompts.py` (string templates only).
only).
(Development workflow, branching discipline, and session protocols live in (Development workflow, branching discipline, and session protocols live in
`~/.claude/CLAUDE.md`.) `~/.claude/CLAUDE.md`.)
@ -99,7 +102,7 @@ only).
| Classes | PascalCase | `_TokenTracker`, `_CacheManager` | | Classes | PascalCase | `_TokenTracker`, `_CacheManager` |
| Constants | UPPER_SNAKE_CASE | `MAX_CONTEXT`, `CACHE_ROOT` | | Constants | UPPER_SNAKE_CASE | `MAX_CONTEXT`, `CACHE_ROOT` |
| Module files | snake_case | `ast_parser.py` | | Module files | snake_case | `ast_parser.py` |
| CLI flags | kebab-case | `--clear-cache`, `--install-extras` | | CLI flags | kebab-case | `--clear-cache`, `--fresh` |
| Private functions | leading underscore | `_run_synthesis` | | Private functions | leading underscore | `_run_synthesis` |
--- ---

27
PLAN.md
View file

@ -687,7 +687,7 @@ fold into any session that touches these helpers.
extension sub-section or similar. Low priority, not blocking. extension sub-section or similar. Low priority, not blocking.
- **Revisit survey-skip thresholds (#46)**`_SURVEY_MIN_FILES` and - **Revisit survey-skip thresholds (#46)**`_SURVEY_MIN_FILES` and
`_SURVEY_MIN_DIRS` shipped with values from #7's example, no `_SURVEY_MIN_DIRS` shipped with values from #7's example, no
empirical basis. Once `--ai` has been run on a variety of real empirical basis. Once luminos has been run on a variety of real
targets, look at which runs skipped the survey vs ran it and decide targets, look at which runs skipped the survey vs ran it and decide
whether the thresholds (or the gate logic itself) need to change. whether the thresholds (or the gate logic itself) need to change.
@ -706,7 +706,7 @@ fold into any session that touches these helpers.
| `luminos_lib/search.py` | **new** — web_search, fetch_url, package_lookup implementations | | `luminos_lib/search.py` | **new** — web_search, fetch_url, package_lookup implementations |
No changes needed to: `tree.py`, `filetypes.py`, `code.py`, `recency.py`, No changes needed to: `tree.py`, `filetypes.py`, `code.py`, `recency.py`,
`disk.py`, `capabilities.py`, `watch.py`, `ast_parser.py` `disk.py`, `ast_parser.py`
--- ---
@ -798,20 +798,20 @@ agent read, in what order, what it decided to skip). Storing the full message
history per directory would allow replaying or auditing an investigation. Cost: history per directory would allow replaying or auditing an investigation. Cost:
storage. Benefit: debuggability, ability to resume investigations more faithfully. storage. Benefit: debuggability, ability to resume investigations more faithfully.
**Watch mode + incremental investigation** **Live re-investigation mode**
Watch mode currently re-runs the full base scan on changes. For AI-augmented A "watch" replacement: detect which directories changed, re-investigate only
watch mode: detect which directories changed, re-investigate only those, and those, and patch the cache entries. The synthesis would then re-run from the
patch the cache entries. The synthesis would then re-run from the updated cache updated cache without re-investigating unchanged directories. The original
without re-investigating unchanged directories. non-AI watch mode was deleted in the #64 scope change because it conflicted
with the AI-first philosophy. If watch comes back, it comes back as this.
**Optional PDF and Office document readers** **PDF and Office document readers**
The data and documents domains would benefit from native content extraction: The data and documents domains would benefit from native content extraction:
- `pdfminer` or `pypdf` for PDF text extraction - `pdfminer` or `pypdf` for PDF text extraction
- `openpyxl` for Excel schema and sheet enumeration - `openpyxl` for Excel schema and sheet enumeration
- `python-docx` for Word document text - `python-docx` for Word document text
These would be optional deps like the existing AI deps, gated behind These slot into `requirements.txt` like any other dependency. The agent
`--install-extras`. The agent currently can only see filename and size for currently can only see filename and size for these formats.
these formats.
**Security-focused analysis mode** **Security-focused analysis mode**
A `--security` flag could tune the investigation toward security-relevant A `--security` flag could tune the investigation toward security-relevant
@ -874,11 +874,6 @@ bad plan wastes turns on shallow directories and skips critical ones. The system
needs quality signals — probably the confidence scores aggregated across the needs quality signals — probably the confidence scores aggregated across the
investigation — to detect when something went wrong and potentially retry. investigation — to detect when something went wrong and potentially retry.
**Watch mode compatibility**
Several of the planned features (survey pass, planning, external tools) are not
designed for incremental re-use in watch mode. Adding AI capability to watch
mode is a separate design problem that deserves its own thinking.
**Turn budget contention** **Turn budget contention**
If the planning pass allocates turns and the agent borrows from its budget when If the planning pass allocates turns and the agent borrows from its budget when
it needs more, there's a risk of runaway investigation on unexpectedly complex it needs more, there's a risk of runaway investigation on unexpectedly complex

View file

@ -1,55 +1,38 @@
# Luminos # Luminos
A file system intelligence tool. Scans a directory and produces a reconnaissance report that tells you what the directory is, what's in it, and what might be worth your attention. A file system intelligence tool. Point it at a directory and it runs an agentic Claude investigation that figures out what the directory is, what's in it, and what might be worth your attention.
Luminos has two modes. The **base mode** is a single Python file that uses only the standard library and GNU coreutils. No pip install, no virtual environment, no dependencies to audit. The **`--ai` mode** runs a multi-pass agentic investigation against the [Claude API](https://www.anthropic.com/api) to reason about what the project actually does and flag anything that looks off. AI mode is opt-in and is the only path that requires pip-installable packages. Luminos is built around a harder question than "what files are here?" It is built around "what is this, and should I be worried about any of it?" To answer that, it runs a multi-pass agentic investigation against the [Claude API](https://www.anthropic.com/api): a survey pass to orient on the target, an isolated dir-loop agent per directory with a small toolbelt (read files, run whitelisted coreutils commands, write cache entries), and a final synthesis pass that produces a project-level verdict with severity-ranked flags.
## Why A lightweight base scan runs first to feed the agent its initial picture of the target. The base scan is not a standalone product, it is the first step of the investigation.
Most "repo explorer" tools answer one question: "what files are here?" Luminos is built around a harder question: "what is this, and should I be worried about any of it?"
The base scan gives you the mechanical answer: directory tree, file classification across seven categories, language breakdown with line counts, recently modified files, disk usage, and the largest files. That is usually enough for a quick "what is this" look.
The AI mode goes further. It runs an isolated investigation per directory, leaves-first, with a small toolbelt (read files, run whitelisted coreutils commands, write cache entries) and a per-directory context budget. Each directory gets its own summary, and a final synthesis pass reads only the directory-level cache entries to produce a whole-project verdict. Findings are flagged with a severity level (`critical`, `concern`, or `info`) so the important stuff floats to the top.
## Features ## Features
- **Zero dependencies in base mode.** Runs on bare Python 3 plus the GNU coreutils you already have. - **Agentic AI investigation.** Multi-pass, leaves-first analysis via Claude. Survey then dir loops then synthesis.
- **Graceful degradation everywhere.** Permission denied, subprocess timeouts, missing API key, missing optional packages: all handled without crashing the scan. - **Investigation cache.** Per-file and per-directory summaries are cached under `/tmp/luminos/` so repeat runs on the same target are cheap.
- **Directory tree.** Visual tree with configurable depth and exclude patterns.
- **File classification.** Files bucketed into seven categories (code, config, docs, data, media, binary, other) via `file(1)` magic.
- **Language detection and LOC counting.** Which languages are present, how many lines of code per language.
- **Recently modified files.** Surface the files most likely to reflect current activity.
- **Disk usage.** Per-directory disk usage with top offenders called out.
- **Watch mode.** Re-scan every 30 seconds and show diffs.
- **JSON output.** Pipe reports to other tools or save for comparison.
- **AI investigation (opt-in).** Multi-pass, leaves-first agentic analysis via Claude, with an investigation cache so repeat runs are cheap.
- **Severity-ranked flags.** Findings are sorted so `critical` items are the first thing you see. - **Severity-ranked flags.** Findings are sorted so `critical` items are the first thing you see.
- **Context budget guard.** Per-turn `input_tokens` is watched against a budget so a rogue directory can't blow the context and silently degrade quality.
- **Graceful degradation.** Permission denied, subprocess timeouts, missing API key: all handled without crashing.
- **JSON output.** Pipe reports to other tools or save for comparison.
## Installation ## Installation
### Base mode Luminos is a normal Python project. Clone, create a venv, and install from `requirements.txt`. The repository ships a helper script that does this for you:
No installation required. Clone and run.
```bash ```bash
git clone https://github.com/archeious/luminos.git git clone https://github.com/archeious/luminos.git
cd luminos cd luminos
python3 luminos.py <target>
```
Works on any system with Python 3 and standard GNU coreutils (`wc`, `file`, `grep`, `head`, `tail`, `stat`, `du`, `find`).
### AI mode
AI mode needs a few pip-installable packages. The project ships a helper script that creates a dedicated virtual environment and installs them:
```bash
./setup_env.sh ./setup_env.sh
source ~/luminos-env/bin/activate source ~/luminos-env/bin/activate
``` ```
The packages installed are `anthropic`, `tree-sitter`, a handful of tree-sitter language grammars, and `python-magic`. Or do it by hand:
```bash
python3 -m venv ~/luminos-env
source ~/luminos-env/bin/activate
pip install -r requirements.txt
```
You also need an Anthropic API key exported as an environment variable: You also need an Anthropic API key exported as an environment variable:
@ -57,25 +40,15 @@ You also need an Anthropic API key exported as an environment variable:
export ANTHROPIC_API_KEY=your-key-here export ANTHROPIC_API_KEY=your-key-here
``` ```
Check which optional dependencies are present: The base scan shells out to a handful of GNU coreutils (`wc`, `file`, `grep`, `head`, `tail`, `stat`, `du`, `find`), so you also need those on `$PATH`. They are installed by default on every mainstream Linux distribution and on macOS via Homebrew.
```bash
python3 luminos.py --install-extras
```
## Usage ## Usage
### Base scan
```bash ```bash
python3 luminos.py /path/to/project python3 luminos.py /path/to/project
``` ```
### AI scan That is the whole interface. The investigation runs end to end and prints a report.
```bash
python3 luminos.py --ai /path/to/project
```
### Common flags ### Common flags
@ -86,28 +59,25 @@ python3 luminos.py -d 8 -a -x .git -x node_modules -x vendor /path/to/project
# JSON output to a file # JSON output to a file
python3 luminos.py --json -o report.json /path/to/project python3 luminos.py --json -o report.json /path/to/project
# Watch mode (re-scan every 30s, show diffs) # Force a fresh investigation, ignoring the cache
python3 luminos.py --watch /path/to/project python3 luminos.py --fresh /path/to/project
# Force a fresh AI investigation, ignoring the cache # Clear the investigation cache
python3 luminos.py --ai --fresh /path/to/project
# Clear the AI investigation cache
python3 luminos.py --clear-cache python3 luminos.py --clear-cache
``` ```
Run `python3 luminos.py --help` for the full flag list. Run `python3 luminos.py --help` for the full flag list.
## How AI mode works ## How the investigation works
A short version of what happens when you pass `--ai`: A short version of what happens on every run:
1. **Discover** every directory under the target. 1. **Base scan.** Builds the directory tree, classifies files into seven categories, counts lines of code, finds large and recently modified files, computes per-directory disk usage. This is the agent's initial picture of the target.
2. **Sort leaves-first** so the deepest directories are investigated before their parents. 2. **Survey pass.** A short agent loop (max 3 turns) reads the base scan, describes the target in plain language, and decides which investigation tools are relevant. Tiny targets skip the survey.
3. **Run an isolated agent loop per directory** with a max of 10 turns each. The agent has a small toolbelt: read files, run whitelisted coreutils commands (`wc`, `file`, `grep`, `head`, `tail`, `stat`, `du`, `find`), and write cache entries. 3. **Dir loops.** Every directory gets its own isolated agent loop, leaves-first, with up to 14 turns. The agent has read-only access to the filesystem and a toolbelt of `read_file`, `list_directory`, `run_command`, `parse_structure`, `write_cache`, `think`, `checkpoint`, `flag`, and `submit_report`.
4. **Cache everything.** Each file and directory summary is written to `/tmp/luminos/` so that subsequent runs on the same target don't burn tokens re-deriving things that haven't changed. 4. **Cache.** Each file and directory summary is written to `/tmp/luminos/` so subsequent runs on the same target don't re-derive what hasn't changed.
5. **Context budget guard.** Per-turn `input_tokens` is watched against a budget (currently 70% of the model's context window) so a rogue directory can't blow the context and silently degrade quality. 5. **Context budget guard.** Per-turn `input_tokens` is watched against a budget (currently 70% of the model's context window) so a rogue directory can't blow the context window.
6. **Final synthesis pass** reads only the directory-level cache entries (not the raw files) to produce a project-level summary and the severity-ranked flags. 6. **Final synthesis.** A short agent loop reads the directory-level cache entries (not the raw files) and produces the project-level brief, the detailed analysis, and the severity-ranked flags.
## Development ## Development
@ -117,11 +87,10 @@ Run the test suite:
python3 -m unittest discover -s tests/ python3 -m unittest discover -s tests/
``` ```
Modules that are intentionally not unit tested and why: Modules that are intentionally not unit tested:
- `luminos_lib/ai.py`: requires a live Anthropic API, tested in practice - `luminos_lib/ai.py`: requires a live Anthropic API, exercised in practice
- `luminos_lib/ast_parser.py`: requires tree-sitter grammars installed - `luminos_lib/ast_parser.py`: requires tree-sitter grammars installed
- `luminos_lib/watch.py`: stateful event loop, tested manually
- `luminos_lib/prompts.py`: string templates only - `luminos_lib/prompts.py`: string templates only
## License ## License

View file

@ -16,16 +16,11 @@ from luminos_lib.filetypes import (
from luminos_lib.code import detect_languages, find_large_files from luminos_lib.code import detect_languages, find_large_files
from luminos_lib.recency import find_recent_files from luminos_lib.recency import find_recent_files
from luminos_lib.disk import get_disk_usage, top_directories from luminos_lib.disk import get_disk_usage, top_directories
from luminos_lib.watch import watch_loop
from luminos_lib.report import format_report from luminos_lib.report import format_report
def _progress(label): def _progress(label):
"""Return (on_file, finish) for in-place per-file progress on stderr. """Return (on_file, finish) for in-place per-file progress on stderr."""
on_file(path) overwrites the current line with the label and truncated path.
finish() finalises the line with a newline.
"""
cols = shutil.get_terminal_size((80, 20)).columns cols = shutil.get_terminal_size((80, 20)).columns
prefix = f" [scan] {label}... " prefix = f" [scan] {label}... "
available = max(cols - len(prefix), 10) available = max(cols - len(prefix), 10)
@ -43,7 +38,7 @@ def _progress(label):
def scan(target, depth=3, show_hidden=False, exclude=None): def scan(target, depth=3, show_hidden=False, exclude=None):
"""Run all analyses on the target directory and return a report dict.""" """Run the base scan and return the report dict consumed by the AI pass."""
report = {} report = {}
exclude = exclude or [] exclude = exclude or []
@ -89,7 +84,8 @@ def main():
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
prog="luminos", prog="luminos",
description="Luminos — file system intelligence tool. " description="Luminos — file system intelligence tool. "
"Explores a directory and produces a reconnaissance report.", "Runs an agentic Claude investigation against a directory "
"and produces a reconnaissance report.",
) )
parser.add_argument("target", nargs="?", help="Target directory to analyze") parser.add_argument("target", nargs="?", help="Target directory to analyze")
parser.add_argument("-d", "--depth", type=int, default=3, parser.add_argument("-d", "--depth", type=int, default=3,
@ -100,17 +96,10 @@ def main():
help="Output report as JSON") help="Output report as JSON")
parser.add_argument("-o", "--output", metavar="FILE", parser.add_argument("-o", "--output", metavar="FILE",
help="Write report to a file") help="Write report to a file")
parser.add_argument("--ai", action="store_true",
help="Use Claude AI to analyze directory purpose "
"(requires ANTHROPIC_API_KEY)")
parser.add_argument("--watch", action="store_true",
help="Re-scan every 30 seconds and show diffs")
parser.add_argument("--clear-cache", action="store_true", parser.add_argument("--clear-cache", action="store_true",
help="Clear the AI investigation cache (/tmp/luminos/)") help="Clear the investigation cache (/tmp/luminos/)")
parser.add_argument("--fresh", action="store_true", parser.add_argument("--fresh", action="store_true",
help="Force a new AI investigation (ignore cached results)") help="Force a new investigation (ignore cached results)")
parser.add_argument("--install-extras", action="store_true",
help="Show status of optional AI dependencies")
parser.add_argument("-x", "--exclude", metavar="DIR", action="append", parser.add_argument("-x", "--exclude", metavar="DIR", action="append",
default=[], default=[],
help="Exclude a directory name from scan and analysis " help="Exclude a directory name from scan and analysis "
@ -118,15 +107,8 @@ def main():
args = parser.parse_args() args = parser.parse_args()
# --install-extras: show package status and exit
if args.install_extras:
from luminos_lib.capabilities import print_status
print_status()
return
# --clear-cache: wipe /tmp/luminos/ (lazy import to avoid AI deps)
if args.clear_cache: if args.clear_cache:
from luminos_lib.capabilities import clear_cache from luminos_lib.cache import clear_cache
clear_cache() clear_cache()
if not args.target: if not args.target:
return return
@ -140,19 +122,18 @@ def main():
file=sys.stderr) file=sys.stderr)
sys.exit(1) sys.exit(1)
if not os.environ.get("ANTHROPIC_API_KEY"):
print("luminos requires ANTHROPIC_API_KEY. "
"Set it with: export ANTHROPIC_API_KEY=your-key-here",
file=sys.stderr)
sys.exit(0)
if args.exclude: if args.exclude:
print(f" [scan] Excluding: {', '.join(args.exclude)}", file=sys.stderr) print(f" [scan] Excluding: {', '.join(args.exclude)}", file=sys.stderr)
if args.watch:
watch_loop(target, depth=args.depth, show_hidden=args.all,
json_output=args.json_output)
return
report = scan(target, depth=args.depth, show_hidden=args.all, report = scan(target, depth=args.depth, show_hidden=args.all,
exclude=args.exclude) exclude=args.exclude)
flags = []
if args.ai:
from luminos_lib.ai import analyze_directory from luminos_lib.ai import analyze_directory
brief, detailed, flags = analyze_directory( brief, detailed, flags = analyze_directory(
report, target, fresh=args.fresh, exclude=args.exclude) report, target, fresh=args.fresh, exclude=args.exclude)

View file

@ -21,7 +21,6 @@ import anthropic
import magic import magic
from luminos_lib.ast_parser import parse_structure from luminos_lib.ast_parser import parse_structure
from luminos_lib.cache import _CacheManager, _get_investigation_id from luminos_lib.cache import _CacheManager, _get_investigation_id
from luminos_lib.capabilities import check_ai_dependencies
from luminos_lib.prompts import ( from luminos_lib.prompts import (
_DIR_SYSTEM_PROMPT, _DIR_SYSTEM_PROMPT,
_SURVEY_SYSTEM_PROMPT, _SURVEY_SYSTEM_PROMPT,
@ -1414,11 +1413,8 @@ def analyze_directory(report, target, verbose_tools=False, fresh=False,
exclude=None): exclude=None):
"""Run AI analysis on the directory. Returns (brief, detailed, flags). """Run AI analysis on the directory. Returns (brief, detailed, flags).
Returns ("", "", []) if the API key is missing or dependencies are not met. Returns ("", "", []) if the API key is missing.
""" """
if not check_ai_dependencies():
sys.exit(1)
api_key = _get_api_key() api_key = _get_api_key()
if not api_key: if not api_key:
return "", "", [] return "", "", []

View file

@ -3,6 +3,8 @@
import hashlib import hashlib
import json import json
import os import os
import shutil
import sys
import uuid import uuid
from datetime import datetime, timezone from datetime import datetime, timezone
@ -10,6 +12,16 @@ CACHE_ROOT = "/tmp/luminos"
INVESTIGATIONS_PATH = os.path.join(CACHE_ROOT, "investigations.json") INVESTIGATIONS_PATH = os.path.join(CACHE_ROOT, "investigations.json")
def clear_cache():
"""Remove all investigation caches under CACHE_ROOT."""
if os.path.isdir(CACHE_ROOT):
shutil.rmtree(CACHE_ROOT)
print(f"Cleared cache: {CACHE_ROOT}", file=sys.stderr)
else:
print(f"No cache to clear ({CACHE_ROOT} does not exist).",
file=sys.stderr)
def _sha256_path(path): def _sha256_path(path):
"""Return a hex SHA-256 of a path string, used as cache key.""" """Return a hex SHA-256 of a path string, used as cache key."""
return hashlib.sha256(path.encode("utf-8")).hexdigest() return hashlib.sha256(path.encode("utf-8")).hexdigest()

View file

@ -1,139 +0,0 @@
"""Capability detection and cache management for optional luminos dependencies.
The base tool requires zero external packages. The --ai flag requires:
- anthropic (API transport)
- tree-sitter (AST parsing via parse_structure tool)
- python-magic (improved file classification)
This module is the single place that knows about optional dependencies.
"""
_PACKAGES = {
"anthropic": {
"import": "anthropic",
"pip": "anthropic",
"purpose": "Claude API client (streaming, retries, token counting)",
},
"tree-sitter": {
"import": "tree_sitter",
"pip": ("tree-sitter tree-sitter-python tree-sitter-javascript "
"tree-sitter-rust tree-sitter-go"),
"purpose": "AST parsing for parse_structure tool",
},
"python-magic": {
"import": "magic",
"pip": "python-magic",
"purpose": "Improved file type detection via libmagic",
},
}
def _check_package(import_name):
"""Return True if a package is importable."""
try:
__import__(import_name)
return True
except ImportError:
return False
ANTHROPIC_AVAILABLE = _check_package("anthropic")
TREE_SITTER_AVAILABLE = _check_package("tree_sitter")
MAGIC_AVAILABLE = _check_package("magic")
def check_ai_dependencies():
"""Check that all --ai dependencies are installed.
If any are missing, prints a clear error with the pip install command
and returns False. Returns True if everything is available.
"""
missing = []
for name, info in _PACKAGES.items():
if not _check_package(info["import"]):
missing.append(name)
if not missing:
return True
# Also check tree-sitter grammar packages
grammar_missing = []
if "tree-sitter" not in missing:
for grammar in ["tree_sitter_python", "tree_sitter_javascript",
"tree_sitter_rust", "tree_sitter_go"]:
if not _check_package(grammar):
grammar_missing.append(grammar.replace("_", "-"))
import sys
print("\nluminos --ai requires missing packages:", file=sys.stderr)
for name in missing:
print(f" \u2717 {name}", file=sys.stderr)
for name in grammar_missing:
print(f" \u2717 {name}", file=sys.stderr)
# Build pip install command
pip_parts = []
for name in missing:
pip_parts.append(_PACKAGES[name]["pip"])
for name in grammar_missing:
pip_parts.append(name)
pip_cmd = " \\\n ".join(pip_parts)
print(f"\n Install with:\n pip install {pip_cmd}\n", file=sys.stderr)
return False
def print_status():
"""Print the install status of all optional packages."""
print("\nLuminos optional dependencies:\n")
for name, info in _PACKAGES.items():
available = _check_package(info["import"])
mark = "\u2713" if available else "\u2717"
status = "installed" if available else "missing"
print(f" {mark} {name:20s} {status:10s} {info['purpose']}")
# Grammar packages
grammars = {
"tree-sitter-python": "tree_sitter_python",
"tree-sitter-javascript": "tree_sitter_javascript",
"tree-sitter-rust": "tree_sitter_rust",
"tree-sitter-go": "tree_sitter_go",
}
print()
for name, imp in grammars.items():
available = _check_package(imp)
mark = "\u2713" if available else "\u2717"
status = "installed" if available else "missing"
print(f" {mark} {name:20s} {status:10s} Language grammar")
# Full install command (deduplicated)
all_pkgs = []
seen = set()
for info in _PACKAGES.values():
for pkg in info["pip"].split():
if pkg not in seen:
all_pkgs.append(pkg)
seen.add(pkg)
for name in grammars:
if name not in seen:
all_pkgs.append(name)
seen.add(name)
print(f"\n Install all with:\n pip install {' '.join(all_pkgs)}\n")
from luminos_lib.cache import CACHE_ROOT
def clear_cache():
"""Remove all investigation caches under /tmp/luminos/."""
import shutil
import os
import sys
if os.path.isdir(CACHE_ROOT):
shutil.rmtree(CACHE_ROOT)
print(f"Cleared cache: {CACHE_ROOT}", file=sys.stderr)
else:
print(f"No cache to clear ({CACHE_ROOT} does not exist).",
file=sys.stderr)

View file

@ -1,108 +0,0 @@
"""Watch mode — re-scan and show diffs every 30 seconds."""
import json
import sys
import time
import os
def _snapshot(classified_files):
"""Create a snapshot dict: path -> (size, category)."""
return {f["path"]: (f["size"], f["category"]) for f in classified_files}
def _diff_snapshots(old, new):
"""Compare two snapshots and return changes."""
old_paths = set(old.keys())
new_paths = set(new.keys())
added = new_paths - old_paths
removed = old_paths - new_paths
common = old_paths & new_paths
size_changes = []
for p in common:
old_size = old[p][0]
new_size = new[p][0]
if old_size != new_size:
size_changes.append((p, old_size, new_size))
return added, removed, size_changes
def _human_size(nbytes):
for unit in ("B", "KB", "MB", "GB"):
if nbytes < 1024:
if unit == "B":
return f"{nbytes} {unit}"
return f"{nbytes:.1f} {unit}"
nbytes /= 1024
return f"{nbytes:.1f} TB"
def watch_loop(target, depth=3, show_hidden=False, json_output=False):
"""Run scan in a loop, printing diffs between runs."""
# Import here to avoid circular import
from luminos_lib.filetypes import classify_files
print(f"[luminos] Watching {target} (Ctrl+C to stop)")
print(f"[luminos] Scanning every 30 seconds...")
print()
prev_snapshot = None
try:
while True:
classified = classify_files(target, show_hidden=show_hidden)
current = _snapshot(classified)
if prev_snapshot is not None:
added, removed, size_changes = _diff_snapshots(
prev_snapshot, current
)
if not added and not removed and not size_changes:
ts = time.strftime("%H:%M:%S")
print(f"[{ts}] No changes detected.")
else:
ts = time.strftime("%H:%M:%S")
print(f"[{ts}] Changes detected:")
if json_output:
diff = {
"timestamp": ts,
"added": sorted(added),
"removed": sorted(removed),
"size_changes": [
{"path": p, "old_size": o, "new_size": n}
for p, o, n in size_changes
],
}
print(json.dumps(diff, indent=2))
else:
for p in sorted(added):
name = os.path.basename(p)
print(f" + NEW {name}")
print(f" {p}")
for p in sorted(removed):
name = os.path.basename(p)
print(f" - DEL {name}")
print(f" {p}")
for p, old_s, new_s in size_changes:
name = os.path.basename(p)
delta = new_s - old_s
sign = "+" if delta > 0 else ""
print(f" ~ SIZE {name} "
f"{_human_size(old_s)} -> {_human_size(new_s)} "
f"({sign}{_human_size(delta)})")
print()
else:
print(f"[{time.strftime('%H:%M:%S')}] "
f"Initial scan complete: {len(current)} files indexed.")
print()
prev_snapshot = current
time.sleep(30)
except KeyboardInterrupt:
print("\n[luminos] Watch stopped.")

7
requirements.txt Normal file
View file

@ -0,0 +1,7 @@
anthropic
python-magic
tree-sitter
tree-sitter-python
tree-sitter-javascript
tree-sitter-rust
tree-sitter-go

View file

@ -2,6 +2,7 @@
set -euo pipefail set -euo pipefail
VENV_DIR="$HOME/luminos-env" VENV_DIR="$HOME/luminos-env"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
if [ -d "$VENV_DIR" ]; then if [ -d "$VENV_DIR" ]; then
echo "venv already exists at $VENV_DIR" echo "venv already exists at $VENV_DIR"
@ -13,17 +14,19 @@ fi
echo "Activating venv..." echo "Activating venv..."
source "$VENV_DIR/bin/activate" source "$VENV_DIR/bin/activate"
echo "Installing packages..." echo "Installing packages from requirements.txt..."
pip install anthropic tree-sitter tree-sitter-python \ pip install -r "$SCRIPT_DIR/requirements.txt"
tree-sitter-javascript tree-sitter-rust \
tree-sitter-go python-magic
echo "" echo ""
echo "Done. To activate the venv in future sessions:" echo "Done. To activate the venv in future sessions:"
echo "" echo ""
echo " source ~/luminos-env/bin/activate" echo " source ~/luminos-env/bin/activate"
echo "" echo ""
echo "Then run luminos as usual:" echo "Set your Anthropic API key:"
echo "" echo ""
echo " python3 luminos.py --ai <target>" echo " export ANTHROPIC_API_KEY=your-key-here"
echo ""
echo "Then run luminos:"
echo ""
echo " python3 luminos.py <target>"
echo "" echo ""

View file

@ -1,37 +0,0 @@
"""Tests for luminos_lib/capabilities.py"""
import unittest
from unittest.mock import patch
from luminos_lib.capabilities import _check_package
class TestCheckPackage(unittest.TestCase):
def test_importable_package(self):
# json is always available in stdlib
self.assertTrue(_check_package("json"))
def test_missing_package(self):
self.assertFalse(_check_package("_luminos_nonexistent_package_xyz"))
def test_importable_returns_true(self):
with patch("builtins.__import__", return_value=None):
# patch doesn't work cleanly here; use a real stdlib module
pass
self.assertTrue(_check_package("os"))
def test_import_error_returns_false(self):
import builtins
original_import = builtins.__import__
def fake_import(name, *args, **kwargs):
if name == "_fake_missing_module":
raise ImportError("No module named '_fake_missing_module'")
return original_import(name, *args, **kwargs)
with patch("builtins.__import__", side_effect=fake_import):
self.assertFalse(_check_package("_fake_missing_module"))
if __name__ == "__main__":
unittest.main()