init: Home, Architecture, DevelopmentGuide, Roadmap, SessionRetrospectives, Session-1
commit
fbc73406a6
6 changed files with 509 additions and 0 deletions
123
Architecture.md
Normal file
123
Architecture.md
Normal file
|
|
@ -0,0 +1,123 @@
|
|||
# Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
Luminos is a zero-dependency Python CLI at its base. The `--ai` flag layers an
|
||||
agentic investigation on top using the Claude API. The two layers are strictly
|
||||
separated — the base scan never requires pip packages.
|
||||
|
||||
**Entry point:** `luminos.py` — argument parsing, scan orchestration, output routing.
|
||||
|
||||
---
|
||||
|
||||
## Module Map
|
||||
|
||||
| Module | Purpose | External commands |
|
||||
|---|---|---|
|
||||
| `luminos.py` | Entry point — arg parsing, scan(), main() | None |
|
||||
| `luminos_lib/tree.py` | Recursive directory tree with file sizes | None (os) |
|
||||
| `luminos_lib/filetypes.py` | Classifies files into 7 categories | `file --brief` |
|
||||
| `luminos_lib/code.py` | Language detection, LOC counting, large file flagging | `wc -l` |
|
||||
| `luminos_lib/recency.py` | Finds N most recently modified files | `find -printf` |
|
||||
| `luminos_lib/disk.py` | Per-directory disk usage | `du -b` |
|
||||
| `luminos_lib/report.py` | Formats report dict as terminal output | None |
|
||||
| `luminos_lib/watch.py` | Continuous monitoring loop with snapshot diffing | None |
|
||||
| `luminos_lib/capabilities.py` | Optional dependency detection, cache cleanup | None |
|
||||
| `luminos_lib/cache.py` | AI investigation cache — read/write/clear/flush | None |
|
||||
| `luminos_lib/ast_parser.py` | tree-sitter code structure parsing | tree-sitter |
|
||||
| `luminos_lib/prompts.py` | System prompt templates for AI loops | None |
|
||||
| `luminos_lib/ai.py` | Multi-pass agentic analysis via Claude API | anthropic, python-magic |
|
||||
|
||||
---
|
||||
|
||||
## Base Scan Data Flow
|
||||
|
||||
```
|
||||
scan(target)
|
||||
build_tree() → report["tree"], report["tree_rendered"]
|
||||
classify_files() → report["file_categories"], report["classified_files"]
|
||||
detect_languages() → report["languages"], report["lines_of_code"]
|
||||
find_large_files() → report["large_files"]
|
||||
find_recent_files() → report["recent_files"]
|
||||
get_disk_usage() → report["disk_usage"], report["top_directories"]
|
||||
└── returns report dict
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## AI Pipeline (--ai flag)
|
||||
|
||||
```
|
||||
analyze_directory(report, target)
|
||||
│
|
||||
├── _discover_directories() find all dirs, sort leaves-first
|
||||
│
|
||||
├── per-directory loop (each dir, up to max_turns=14)
|
||||
│ _build_dir_context() list files + sizes
|
||||
│ _get_child_summaries() read cached child summaries
|
||||
│ _run_dir_loop() agent loop: read files, parse structure,
|
||||
│ write cache entries, submit_report
|
||||
│ Tools: read_file, list_directory,
|
||||
│ run_command, parse_structure,
|
||||
│ write_cache, think, checkpoint,
|
||||
│ flag, submit_report
|
||||
│
|
||||
├── _run_synthesis() one-shot aggregation of dir summaries
|
||||
│ reads all "dir" cache entries
|
||||
│ produces brief (2-4 sentences) + detailed (free-form)
|
||||
│ Tools: read_cache, list_cache, flag, submit_report
|
||||
│
|
||||
└── returns (brief, detailed, flags)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cache
|
||||
|
||||
Location: `/tmp/luminos/<investigation_id>/`
|
||||
|
||||
Two entry types, both stored as JSONL:
|
||||
|
||||
**File entries** (`files.jsonl`):
|
||||
```
|
||||
{path, relative_path, size_bytes, category, summary, notable,
|
||||
notable_reason, cached_at}
|
||||
```
|
||||
|
||||
**Dir entries** (`dirs.jsonl`):
|
||||
```
|
||||
{path, relative_path, child_count, summary, dominant_category,
|
||||
notable_files, cached_at}
|
||||
```
|
||||
|
||||
**Flags** (`flags.jsonl`):
|
||||
```
|
||||
{path, finding, severity} severity: info | concern | critical
|
||||
```
|
||||
|
||||
Cache is reused across runs for the same target. `--fresh` ignores it.
|
||||
`--clear-cache` deletes it.
|
||||
|
||||
---
|
||||
|
||||
## Key Constraints
|
||||
|
||||
- **Base tool: no pip dependencies.** tree, filetypes, code, disk, recency,
|
||||
report, watch use only stdlib and GNU coreutils.
|
||||
- **AI deps are lazy.** `anthropic`, `tree-sitter`, `python-magic` imported
|
||||
only when `--ai` is used. Missing packages produce a clear install error.
|
||||
- **Subprocess for OS tools.** LOC counting, file detection, disk usage, and
|
||||
recency shell out to GNU coreutils. Do not reimplement in pure Python.
|
||||
- **Graceful degradation everywhere.** Permission denied, subprocess timeouts,
|
||||
missing API key — all handled without crashing.
|
||||
|
||||
---
|
||||
|
||||
## AI Model
|
||||
|
||||
`claude-sonnet-4-20250514`
|
||||
|
||||
Context budget: 70% of 180,000 tokens (126,000). Early exit flushes partial
|
||||
cache on budget breach.
|
||||
|
||||
Pricing tracked and reported at end of each run.
|
||||
152
DevelopmentGuide.md
Normal file
152
DevelopmentGuide.md
Normal file
|
|
@ -0,0 +1,152 @@
|
|||
# Development Guide
|
||||
|
||||
## Running Luminos
|
||||
|
||||
```bash
|
||||
# Base scan
|
||||
python3 luminos.py <target>
|
||||
|
||||
# With AI analysis (requires ANTHROPIC_API_KEY)
|
||||
source ~/luminos-env/bin/activate
|
||||
python3 luminos.py --ai <target>
|
||||
|
||||
# Common flags
|
||||
python3 luminos.py --ai --fresh --clear-cache <target> # force clean run
|
||||
python3 luminos.py -x .git -x node_modules <target> # exclude dirs
|
||||
python3 luminos.py -d 8 -a <target> # depth 8, include hidden
|
||||
python3 luminos.py --json -o report.json <target> # JSON output
|
||||
|
||||
# Watch mode
|
||||
python3 luminos.py --watch <target>
|
||||
|
||||
# Check optional dep status
|
||||
python3 luminos.py --install-extras
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Optional Dependencies Setup
|
||||
|
||||
```bash
|
||||
# One-time setup
|
||||
bash setup_env.sh
|
||||
|
||||
# Or manually
|
||||
python3 -m venv ~/luminos-env
|
||||
source ~/luminos-env/bin/activate
|
||||
pip install anthropic tree-sitter tree-sitter-python \
|
||||
tree-sitter-javascript tree-sitter-rust \
|
||||
tree-sitter-go python-magic
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Git Workflow
|
||||
|
||||
Every change starts on a branch. Nothing goes directly to main.
|
||||
|
||||
### Branch naming
|
||||
|
||||
```
|
||||
<type>/<short-description>
|
||||
```
|
||||
|
||||
| Type | Use |
|
||||
|---|---|
|
||||
| `feat/` | New feature or capability |
|
||||
| `fix/` | Bug fix |
|
||||
| `refactor/` | Restructure without behavior change |
|
||||
| `chore/` | Tooling, config, documentation |
|
||||
| `test/` | Tests |
|
||||
|
||||
Examples: `feat/survey-pass`, `fix/cache-flush-on-error`, `refactor/synthesis-tiers`
|
||||
|
||||
### Commit messages
|
||||
|
||||
```
|
||||
<type>: <short description>
|
||||
```
|
||||
|
||||
Examples:
|
||||
```
|
||||
feat: add web_search tool to dir loop
|
||||
fix: handle empty dir cache gracefully in synthesis
|
||||
refactor: extract survey pass into _run_survey()
|
||||
chore: update Architecture wiki page
|
||||
```
|
||||
|
||||
One commit per logical unit of work, not one per file.
|
||||
|
||||
### Merge procedure
|
||||
|
||||
```bash
|
||||
git checkout main
|
||||
git merge --no-ff <branch> -m "merge: <description>"
|
||||
git branch -d <branch>
|
||||
```
|
||||
|
||||
`--no-ff` preserves branch history. Delete branch after merging.
|
||||
|
||||
---
|
||||
|
||||
## Naming Conventions
|
||||
|
||||
| Context | Convention | Example |
|
||||
|---|---|---|
|
||||
| Functions / variables | snake_case | `classify_files`, `dir_path` |
|
||||
| Classes | PascalCase | `_TokenTracker`, `_CacheManager` |
|
||||
| Constants | UPPER_SNAKE_CASE | `MAX_CONTEXT`, `CACHE_ROOT` |
|
||||
| Module files | snake_case | `ast_parser.py`, `filetypes.py` |
|
||||
| CLI flags | kebab-case | `--clear-cache`, `--install-extras` |
|
||||
| Private functions | leading underscore | `_run_synthesis`, `_build_dir_context` |
|
||||
|
||||
---
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
luminos/
|
||||
├── luminos.py entry point
|
||||
├── luminos_lib/
|
||||
│ ├── ai.py AI pipeline (heaviest module)
|
||||
│ ├── ast_parser.py tree-sitter parsing
|
||||
│ ├── cache.py investigation cache management
|
||||
│ ├── capabilities.py optional dep detection
|
||||
│ ├── code.py language + LOC detection
|
||||
│ ├── disk.py disk usage
|
||||
│ ├── filetypes.py file classification
|
||||
│ ├── prompts.py AI system prompt templates
|
||||
│ ├── recency.py recently modified files
|
||||
│ ├── report.py terminal report formatter
|
||||
│ ├── tree.py directory tree
|
||||
│ └── watch.py watch mode
|
||||
├── docs/wiki/ local clone of Forgejo wiki (gitignored)
|
||||
├── setup_env.sh venv + AI dep setup script
|
||||
├── CLAUDE.md Claude Code context (thin — points to wiki)
|
||||
└── PLAN.md evolution plan and design notes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Wiki
|
||||
|
||||
Wiki lives at `docs/wiki/` (gitignored — separate git repo).
|
||||
|
||||
```bash
|
||||
# First time
|
||||
git clone ssh://git@forgejo-claude/archeious/luminos.wiki.git docs/wiki/
|
||||
|
||||
# Returning
|
||||
git -C docs/wiki pull
|
||||
```
|
||||
|
||||
Wiki URL: https://forgejo.labbity.unbiasedgeek.com/archeious/luminos/wiki
|
||||
|
||||
When updating wiki pages:
|
||||
```bash
|
||||
cd docs/wiki
|
||||
# edit pages
|
||||
git add -A
|
||||
git commit -m "wiki: <description>"
|
||||
git push
|
||||
```
|
||||
44
Home.md
Normal file
44
Home.md
Normal file
|
|
@ -0,0 +1,44 @@
|
|||
# Luminos
|
||||
|
||||
Luminos is a file system intelligence tool — a zero-dependency Python CLI that
|
||||
scans a directory and produces a reconnaissance report. With `--ai` it runs a
|
||||
multi-pass agentic investigation via the Claude API, producing a deep analysis
|
||||
of what the directory contains and why.
|
||||
|
||||
---
|
||||
|
||||
## Current State
|
||||
|
||||
- **Phase:** Active development — core pipeline stable, scaling and domain intelligence planned
|
||||
- **Last worked on:** 2026-04-06
|
||||
- **Last commit:** merge: add -x/--exclude flag for directory exclusion
|
||||
- **Blocking:** None
|
||||
|
||||
---
|
||||
|
||||
## Quick Links
|
||||
|
||||
| Page | Contents |
|
||||
|---|---|
|
||||
| [Architecture](Architecture) | Module breakdown, data flow, AI pipeline |
|
||||
| [DevelopmentGuide](DevelopmentGuide) | Git workflow, naming conventions, commands |
|
||||
| [Roadmap](Roadmap) | Planned phases and open design questions |
|
||||
| [SessionRetrospectives](SessionRetrospectives) | Full session history |
|
||||
|
||||
---
|
||||
|
||||
## At a Glance
|
||||
|
||||
```bash
|
||||
python3 luminos.py <target> # base scan
|
||||
python3 luminos.py --ai <target> # AI analysis
|
||||
python3 luminos.py --ai --refine <target> # AI + refinement pass (planned)
|
||||
python3 luminos.py -x .git -x node_modules <target> # exclude dirs
|
||||
python3 luminos.py --watch <target> # continuous monitoring
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Repository
|
||||
|
||||
https://forgejo.labbity.unbiasedgeek.com/archeious/luminos
|
||||
125
Roadmap.md
Normal file
125
Roadmap.md
Normal file
|
|
@ -0,0 +1,125 @@
|
|||
# Roadmap
|
||||
|
||||
Full design notes and open questions live in `PLAN.md` in the repo root.
|
||||
This page tracks phase status.
|
||||
|
||||
---
|
||||
|
||||
## Core Philosophy
|
||||
|
||||
Move from a **pipeline with AI steps** to **investigation driven by curiosity**.
|
||||
The agent should decide what it needs to know and how to find it out — not
|
||||
execute a predetermined checklist.
|
||||
|
||||
---
|
||||
|
||||
## Phases
|
||||
|
||||
### Phase 1 — Confidence Tracking
|
||||
Add `confidence` + `confidence_reason` to file and dir cache entries.
|
||||
Agent sets this when writing cache. Enables later phases to prioritize
|
||||
re-investigation of uncertain entries.
|
||||
|
||||
**Status:** Not started
|
||||
|
||||
---
|
||||
|
||||
### Phase 2 — Survey Pass
|
||||
Lightweight pre-investigation pass. Agent looks at file type distribution
|
||||
and tree structure, then answers: what is this, how should I investigate it,
|
||||
which tools are relevant?
|
||||
|
||||
Replaces hardcoded domain detection with AI-driven characterization.
|
||||
Survey output injected into dir loop system prompts as context.
|
||||
|
||||
**Status:** Not started
|
||||
|
||||
---
|
||||
|
||||
### Phase 3 — Investigation Planning
|
||||
After survey, a planning pass allocates investigation depth per directory.
|
||||
Replaces fixed max_turns-per-dir with a global turn budget the agent manages.
|
||||
Priority dirs get more turns; trivial dirs get fewer; generated/vendored dirs
|
||||
get skipped.
|
||||
|
||||
**Status:** Not started
|
||||
|
||||
---
|
||||
|
||||
### Phase 4 — External Knowledge Tools
|
||||
Resolution strategies for uncertainty beyond local files:
|
||||
- `web_search` — unfamiliar library, format, API
|
||||
- `package_lookup` — PyPI / npm / crates.io metadata
|
||||
- `fetch_url` — follow URLs referenced in local files
|
||||
- `ask_user` — interactive mode, last resort
|
||||
|
||||
All gated behind `--no-external` flag. Budget-limited per session.
|
||||
|
||||
**Status:** Not started
|
||||
|
||||
---
|
||||
|
||||
### Phase 5 — Scale-Tiered Synthesis
|
||||
Calibrate synthesis input and depth to target size:
|
||||
|
||||
| Tier | Size | Approach |
|
||||
|---|---|---|
|
||||
| small | <5 dirs / <30 files | Per-file cache entries as synthesis input |
|
||||
| medium | 5–30 dirs | Dir summaries (current) |
|
||||
| large | 31–150 dirs | Multi-level synthesis |
|
||||
| xlarge | >150 dirs | Multi-level + subsystem grouping |
|
||||
|
||||
**Status:** Not started
|
||||
|
||||
---
|
||||
|
||||
### Phase 6 — Multi-Level Synthesis
|
||||
For large/xlarge: grouping pass identifies logical subsystems from dir
|
||||
summaries (not directory structure). Final synthesis receives 3–10 subsystem
|
||||
summaries rather than hundreds of dir summaries.
|
||||
|
||||
**Status:** Not started
|
||||
|
||||
---
|
||||
|
||||
### Phase 7 — Hypothesis-Driven Synthesis
|
||||
Synthesis reframed from aggregation to conclusion-with-evidence. Agent
|
||||
forms a hypothesis, looks for confirming/refuting evidence, considers
|
||||
alternatives, then submits.
|
||||
|
||||
Produces analytical output rather than descriptive output.
|
||||
|
||||
**Status:** Not started
|
||||
|
||||
---
|
||||
|
||||
### Phase 8 — Refinement Pass
|
||||
Post-synthesis targeted re-investigation. Agent receives current synthesis,
|
||||
identifies gaps and contradictions, goes back to actual files (or external
|
||||
sources), submits improved report.
|
||||
|
||||
Triggered by `--refine` flag. `--refine-depth N` for multiple passes.
|
||||
|
||||
**Status:** Not started
|
||||
|
||||
---
|
||||
|
||||
### Phase 9 — Dynamic Report Structure
|
||||
Synthesis produces a superset of possible output fields; report formatter
|
||||
renders only populated ones. Output naturally scales from minimal (small
|
||||
simple targets) to comprehensive (large complex targets).
|
||||
|
||||
**Status:** Not started
|
||||
|
||||
---
|
||||
|
||||
## Open Design Questions
|
||||
|
||||
See `PLAN.md` — Known Unknowns and Concerns sections.
|
||||
|
||||
Key unresolved items:
|
||||
- Which search API to use for web_search
|
||||
- Whether external tools should be opt-in or opt-out by default
|
||||
- How to handle confidence calibration (numeric vs categorical)
|
||||
- Config file format and location for tunable thresholds
|
||||
- Progressive output / interactive mode UX design
|
||||
56
Session-1.md
Normal file
56
Session-1.md
Normal file
|
|
@ -0,0 +1,56 @@
|
|||
# Session 1 — 2026-04-06
|
||||
|
||||
## What Was Shipped
|
||||
|
||||
### Scan progress output
|
||||
Added `[scan]` step reporting to stderr for all base scan steps. Previously
|
||||
the tool was silent until the report appeared.
|
||||
|
||||
### In-place per-file progress display
|
||||
File-iterating steps (classify, count lines, check large files) now update
|
||||
a single line in-place using `\r` + ANSI clear-to-EOL rather than scrolling.
|
||||
Modules gained optional `on_file` callbacks; `luminos.py` wires them up via
|
||||
a `_progress(label)` helper.
|
||||
|
||||
### `--exclude` / `-x` flag
|
||||
Exclude directories by name from all scan steps and AI analysis. Repeatable:
|
||||
`-x .git -x node_modules`. Propagated through tree, filetypes, recency, disk,
|
||||
and ai._discover_directories.
|
||||
|
||||
### Forgejo project
|
||||
Created repo at https://forgejo.labbity.unbiasedgeek.com/archeious/luminos.
|
||||
Pushed full history.
|
||||
|
||||
### PLAN.md
|
||||
Detailed evolution plan covering:
|
||||
- AI-driven domain detection (replaces hardcoded taxonomy)
|
||||
- Scale-tiered synthesis (small/medium/large/xlarge)
|
||||
- Multi-level synthesis for large repos
|
||||
- Uncertainty as first-class concept with resolution strategies
|
||||
- External knowledge tools (web search, package lookup, URL fetch, ask_user)
|
||||
- Investigation planning pass
|
||||
- Hypothesis-driven synthesis
|
||||
- Refinement pass (`--refine`)
|
||||
- Known unknowns, concerns, raw thoughts
|
||||
|
||||
### Wiki + development practices
|
||||
Initialized Forgejo wiki. Created: Home, Architecture, DevelopmentGuide,
|
||||
Roadmap, SessionRetrospectives. Rewrote CLAUDE.md to follow harbormind's
|
||||
thin-CLAUDE.md + wiki pattern.
|
||||
|
||||
---
|
||||
|
||||
## Commits
|
||||
|
||||
- `feat: add progress output to base scan steps`
|
||||
- `feat: in-place per-file progress for classify, count, and large-file steps`
|
||||
- `feat: add -x/--exclude flag to exclude directories from scan and AI analysis`
|
||||
|
||||
---
|
||||
|
||||
## State at End of Session
|
||||
|
||||
- main is clean, all features merged
|
||||
- PLAN.md written, no implementation started
|
||||
- Wiki initialized, all pages current
|
||||
- No blocking issues
|
||||
9
SessionRetrospectives.md
Normal file
9
SessionRetrospectives.md
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# Session Retrospectives
|
||||
|
||||
| Session | Date | Summary |
|
||||
|---|---|---|
|
||||
| [Session 1](Session-1) | 2026-04-06 | Project setup, scan improvements, Forgejo repo, wiki, development practices |
|
||||
|
||||
---
|
||||
|
||||
Full session notes linked above.
|
||||
Loading…
Reference in a new issue