init: Home, Architecture, DevelopmentGuide, Roadmap, SessionRetrospectives, Session-1

Jeff Smith 2026-04-06 16:11:05 -06:00
commit fbc73406a6
6 changed files with 509 additions and 0 deletions

123
Architecture.md Normal file

@ -0,0 +1,123 @@
# Architecture
## Overview
Luminos is a zero-dependency Python CLI at its base. The `--ai` flag layers an
agentic investigation on top using the Claude API. The two layers are strictly
separated — the base scan never requires pip packages.
**Entry point:** `luminos.py` — argument parsing, scan orchestration, output routing.
---
## Module Map
| Module | Purpose | External commands |
|---|---|---|
| `luminos.py` | Entry point — arg parsing, scan(), main() | None |
| `luminos_lib/tree.py` | Recursive directory tree with file sizes | None (os) |
| `luminos_lib/filetypes.py` | Classifies files into 7 categories | `file --brief` |
| `luminos_lib/code.py` | Language detection, LOC counting, large file flagging | `wc -l` |
| `luminos_lib/recency.py` | Finds N most recently modified files | `find -printf` |
| `luminos_lib/disk.py` | Per-directory disk usage | `du -b` |
| `luminos_lib/report.py` | Formats report dict as terminal output | None |
| `luminos_lib/watch.py` | Continuous monitoring loop with snapshot diffing | None |
| `luminos_lib/capabilities.py` | Optional dependency detection, cache cleanup | None |
| `luminos_lib/cache.py` | AI investigation cache — read/write/clear/flush | None |
| `luminos_lib/ast_parser.py` | tree-sitter code structure parsing | tree-sitter |
| `luminos_lib/prompts.py` | System prompt templates for AI loops | None |
| `luminos_lib/ai.py` | Multi-pass agentic analysis via Claude API | anthropic, python-magic |
---
## Base Scan Data Flow
```
scan(target)
build_tree() → report["tree"], report["tree_rendered"]
classify_files() → report["file_categories"], report["classified_files"]
detect_languages() → report["languages"], report["lines_of_code"]
find_large_files() → report["large_files"]
find_recent_files() → report["recent_files"]
get_disk_usage() → report["disk_usage"], report["top_directories"]
└── returns report dict
```
---
## AI Pipeline (--ai flag)
```
analyze_directory(report, target)
├── _discover_directories() find all dirs, sort leaves-first
├── per-directory loop (each dir, up to max_turns=14)
│ _build_dir_context() list files + sizes
│ _get_child_summaries() read cached child summaries
│ _run_dir_loop() agent loop: read files, parse structure,
│ write cache entries, submit_report
│ Tools: read_file, list_directory,
│ run_command, parse_structure,
│ write_cache, think, checkpoint,
│ flag, submit_report
├── _run_synthesis() one-shot aggregation of dir summaries
│ reads all "dir" cache entries
│ produces brief (2-4 sentences) + detailed (free-form)
│ Tools: read_cache, list_cache, flag, submit_report
└── returns (brief, detailed, flags)
```
---
## Cache
Location: `/tmp/luminos/<investigation_id>/`
Two entry types, both stored as JSONL:
**File entries** (`files.jsonl`):
```
{path, relative_path, size_bytes, category, summary, notable,
notable_reason, cached_at}
```
**Dir entries** (`dirs.jsonl`):
```
{path, relative_path, child_count, summary, dominant_category,
notable_files, cached_at}
```
**Flags** (`flags.jsonl`):
```
{path, finding, severity} severity: info | concern | critical
```
Cache is reused across runs for the same target. `--fresh` ignores it.
`--clear-cache` deletes it.
---
## Key Constraints
- **Base tool: no pip dependencies.** tree, filetypes, code, disk, recency,
report, watch use only stdlib and GNU coreutils.
- **AI deps are lazy.** `anthropic`, `tree-sitter`, `python-magic` imported
only when `--ai` is used. Missing packages produce a clear install error.
- **Subprocess for OS tools.** LOC counting, file detection, disk usage, and
recency shell out to GNU coreutils. Do not reimplement in pure Python.
- **Graceful degradation everywhere.** Permission denied, subprocess timeouts,
missing API key — all handled without crashing.
---
## AI Model
`claude-sonnet-4-20250514`
Context budget: 70% of 180,000 tokens (126,000). Early exit flushes partial
cache on budget breach.
Pricing tracked and reported at end of each run.

152
DevelopmentGuide.md Normal file

@ -0,0 +1,152 @@
# Development Guide
## Running Luminos
```bash
# Base scan
python3 luminos.py <target>
# With AI analysis (requires ANTHROPIC_API_KEY)
source ~/luminos-env/bin/activate
python3 luminos.py --ai <target>
# Common flags
python3 luminos.py --ai --fresh --clear-cache <target> # force clean run
python3 luminos.py -x .git -x node_modules <target> # exclude dirs
python3 luminos.py -d 8 -a <target> # depth 8, include hidden
python3 luminos.py --json -o report.json <target> # JSON output
# Watch mode
python3 luminos.py --watch <target>
# Check optional dep status
python3 luminos.py --install-extras
```
---
## Optional Dependencies Setup
```bash
# One-time setup
bash setup_env.sh
# Or manually
python3 -m venv ~/luminos-env
source ~/luminos-env/bin/activate
pip install anthropic tree-sitter tree-sitter-python \
tree-sitter-javascript tree-sitter-rust \
tree-sitter-go python-magic
```
---
## Git Workflow
Every change starts on a branch. Nothing goes directly to main.
### Branch naming
```
<type>/<short-description>
```
| Type | Use |
|---|---|
| `feat/` | New feature or capability |
| `fix/` | Bug fix |
| `refactor/` | Restructure without behavior change |
| `chore/` | Tooling, config, documentation |
| `test/` | Tests |
Examples: `feat/survey-pass`, `fix/cache-flush-on-error`, `refactor/synthesis-tiers`
### Commit messages
```
<type>: <short description>
```
Examples:
```
feat: add web_search tool to dir loop
fix: handle empty dir cache gracefully in synthesis
refactor: extract survey pass into _run_survey()
chore: update Architecture wiki page
```
One commit per logical unit of work, not one per file.
### Merge procedure
```bash
git checkout main
git merge --no-ff <branch> -m "merge: <description>"
git branch -d <branch>
```
`--no-ff` preserves branch history. Delete branch after merging.
---
## Naming Conventions
| Context | Convention | Example |
|---|---|---|
| Functions / variables | snake_case | `classify_files`, `dir_path` |
| Classes | PascalCase | `_TokenTracker`, `_CacheManager` |
| Constants | UPPER_SNAKE_CASE | `MAX_CONTEXT`, `CACHE_ROOT` |
| Module files | snake_case | `ast_parser.py`, `filetypes.py` |
| CLI flags | kebab-case | `--clear-cache`, `--install-extras` |
| Private functions | leading underscore | `_run_synthesis`, `_build_dir_context` |
---
## Project Structure
```
luminos/
├── luminos.py entry point
├── luminos_lib/
│ ├── ai.py AI pipeline (heaviest module)
│ ├── ast_parser.py tree-sitter parsing
│ ├── cache.py investigation cache management
│ ├── capabilities.py optional dep detection
│ ├── code.py language + LOC detection
│ ├── disk.py disk usage
│ ├── filetypes.py file classification
│ ├── prompts.py AI system prompt templates
│ ├── recency.py recently modified files
│ ├── report.py terminal report formatter
│ ├── tree.py directory tree
│ └── watch.py watch mode
├── docs/wiki/ local clone of Forgejo wiki (gitignored)
├── setup_env.sh venv + AI dep setup script
├── CLAUDE.md Claude Code context (thin — points to wiki)
└── PLAN.md evolution plan and design notes
```
---
## Wiki
Wiki lives at `docs/wiki/` (gitignored — separate git repo).
```bash
# First time
git clone ssh://git@forgejo-claude/archeious/luminos.wiki.git docs/wiki/
# Returning
git -C docs/wiki pull
```
Wiki URL: https://forgejo.labbity.unbiasedgeek.com/archeious/luminos/wiki
When updating wiki pages:
```bash
cd docs/wiki
# edit pages
git add -A
git commit -m "wiki: <description>"
git push
```

44
Home.md Normal file

@ -0,0 +1,44 @@
# Luminos
Luminos is a file system intelligence tool — a zero-dependency Python CLI that
scans a directory and produces a reconnaissance report. With `--ai` it runs a
multi-pass agentic investigation via the Claude API, producing a deep analysis
of what the directory contains and why.
---
## Current State
- **Phase:** Active development — core pipeline stable, scaling and domain intelligence planned
- **Last worked on:** 2026-04-06
- **Last commit:** merge: add -x/--exclude flag for directory exclusion
- **Blocking:** None
---
## Quick Links
| Page | Contents |
|---|---|
| [Architecture](Architecture) | Module breakdown, data flow, AI pipeline |
| [DevelopmentGuide](DevelopmentGuide) | Git workflow, naming conventions, commands |
| [Roadmap](Roadmap) | Planned phases and open design questions |
| [SessionRetrospectives](SessionRetrospectives) | Full session history |
---
## At a Glance
```bash
python3 luminos.py <target> # base scan
python3 luminos.py --ai <target> # AI analysis
python3 luminos.py --ai --refine <target> # AI + refinement pass (planned)
python3 luminos.py -x .git -x node_modules <target> # exclude dirs
python3 luminos.py --watch <target> # continuous monitoring
```
---
## Repository
https://forgejo.labbity.unbiasedgeek.com/archeious/luminos

125
Roadmap.md Normal file

@ -0,0 +1,125 @@
# Roadmap
Full design notes and open questions live in `PLAN.md` in the repo root.
This page tracks phase status.
---
## Core Philosophy
Move from a **pipeline with AI steps** to **investigation driven by curiosity**.
The agent should decide what it needs to know and how to find it out — not
execute a predetermined checklist.
---
## Phases
### Phase 1 — Confidence Tracking
Add `confidence` + `confidence_reason` to file and dir cache entries.
Agent sets this when writing cache. Enables later phases to prioritize
re-investigation of uncertain entries.
**Status:** Not started
---
### Phase 2 — Survey Pass
Lightweight pre-investigation pass. Agent looks at file type distribution
and tree structure, then answers: what is this, how should I investigate it,
which tools are relevant?
Replaces hardcoded domain detection with AI-driven characterization.
Survey output injected into dir loop system prompts as context.
**Status:** Not started
---
### Phase 3 — Investigation Planning
After survey, a planning pass allocates investigation depth per directory.
Replaces fixed max_turns-per-dir with a global turn budget the agent manages.
Priority dirs get more turns; trivial dirs get fewer; generated/vendored dirs
get skipped.
**Status:** Not started
---
### Phase 4 — External Knowledge Tools
Resolution strategies for uncertainty beyond local files:
- `web_search` — unfamiliar library, format, API
- `package_lookup` — PyPI / npm / crates.io metadata
- `fetch_url` — follow URLs referenced in local files
- `ask_user` — interactive mode, last resort
All gated behind `--no-external` flag. Budget-limited per session.
**Status:** Not started
---
### Phase 5 — Scale-Tiered Synthesis
Calibrate synthesis input and depth to target size:
| Tier | Size | Approach |
|---|---|---|
| small | <5 dirs / <30 files | Per-file cache entries as synthesis input |
| medium | 530 dirs | Dir summaries (current) |
| large | 31150 dirs | Multi-level synthesis |
| xlarge | >150 dirs | Multi-level + subsystem grouping |
**Status:** Not started
---
### Phase 6 — Multi-Level Synthesis
For large/xlarge: grouping pass identifies logical subsystems from dir
summaries (not directory structure). Final synthesis receives 310 subsystem
summaries rather than hundreds of dir summaries.
**Status:** Not started
---
### Phase 7 — Hypothesis-Driven Synthesis
Synthesis reframed from aggregation to conclusion-with-evidence. Agent
forms a hypothesis, looks for confirming/refuting evidence, considers
alternatives, then submits.
Produces analytical output rather than descriptive output.
**Status:** Not started
---
### Phase 8 — Refinement Pass
Post-synthesis targeted re-investigation. Agent receives current synthesis,
identifies gaps and contradictions, goes back to actual files (or external
sources), submits improved report.
Triggered by `--refine` flag. `--refine-depth N` for multiple passes.
**Status:** Not started
---
### Phase 9 — Dynamic Report Structure
Synthesis produces a superset of possible output fields; report formatter
renders only populated ones. Output naturally scales from minimal (small
simple targets) to comprehensive (large complex targets).
**Status:** Not started
---
## Open Design Questions
See `PLAN.md` — Known Unknowns and Concerns sections.
Key unresolved items:
- Which search API to use for web_search
- Whether external tools should be opt-in or opt-out by default
- How to handle confidence calibration (numeric vs categorical)
- Config file format and location for tunable thresholds
- Progressive output / interactive mode UX design

56
Session-1.md Normal file

@ -0,0 +1,56 @@
# Session 1 — 2026-04-06
## What Was Shipped
### Scan progress output
Added `[scan]` step reporting to stderr for all base scan steps. Previously
the tool was silent until the report appeared.
### In-place per-file progress display
File-iterating steps (classify, count lines, check large files) now update
a single line in-place using `\r` + ANSI clear-to-EOL rather than scrolling.
Modules gained optional `on_file` callbacks; `luminos.py` wires them up via
a `_progress(label)` helper.
### `--exclude` / `-x` flag
Exclude directories by name from all scan steps and AI analysis. Repeatable:
`-x .git -x node_modules`. Propagated through tree, filetypes, recency, disk,
and ai._discover_directories.
### Forgejo project
Created repo at https://forgejo.labbity.unbiasedgeek.com/archeious/luminos.
Pushed full history.
### PLAN.md
Detailed evolution plan covering:
- AI-driven domain detection (replaces hardcoded taxonomy)
- Scale-tiered synthesis (small/medium/large/xlarge)
- Multi-level synthesis for large repos
- Uncertainty as first-class concept with resolution strategies
- External knowledge tools (web search, package lookup, URL fetch, ask_user)
- Investigation planning pass
- Hypothesis-driven synthesis
- Refinement pass (`--refine`)
- Known unknowns, concerns, raw thoughts
### Wiki + development practices
Initialized Forgejo wiki. Created: Home, Architecture, DevelopmentGuide,
Roadmap, SessionRetrospectives. Rewrote CLAUDE.md to follow harbormind's
thin-CLAUDE.md + wiki pattern.
---
## Commits
- `feat: add progress output to base scan steps`
- `feat: in-place per-file progress for classify, count, and large-file steps`
- `feat: add -x/--exclude flag to exclude directories from scan and AI analysis`
---
## State at End of Session
- main is clean, all features merged
- PLAN.md written, no implementation started
- Wiki initialized, all pages current
- No blocking issues

9
SessionRetrospectives.md Normal file

@ -0,0 +1,9 @@
# Session Retrospectives
| Session | Date | Summary |
|---|---|---|
| [Session 1](Session-1) | 2026-04-06 | Project setup, scan improvements, Forgejo repo, wiki, development practices |
---
Full session notes linked above.