From fbc73406a622f068a92e7b255360ec9e7836c985 Mon Sep 17 00:00:00 2001 From: Jeff Smith Date: Mon, 6 Apr 2026 16:11:05 -0600 Subject: [PATCH] init: Home, Architecture, DevelopmentGuide, Roadmap, SessionRetrospectives, Session-1 --- Architecture.md | 123 +++++++++++++++++++++++++++++++ DevelopmentGuide.md | 152 +++++++++++++++++++++++++++++++++++++++ Home.md | 44 ++++++++++++ Roadmap.md | 125 ++++++++++++++++++++++++++++++++ Session-1.md | 56 +++++++++++++++ SessionRetrospectives.md | 9 +++ 6 files changed, 509 insertions(+) create mode 100644 Architecture.md create mode 100644 DevelopmentGuide.md create mode 100644 Home.md create mode 100644 Roadmap.md create mode 100644 Session-1.md create mode 100644 SessionRetrospectives.md diff --git a/Architecture.md b/Architecture.md new file mode 100644 index 0000000..8041758 --- /dev/null +++ b/Architecture.md @@ -0,0 +1,123 @@ +# Architecture + +## Overview + +Luminos is a zero-dependency Python CLI at its base. The `--ai` flag layers an +agentic investigation on top using the Claude API. The two layers are strictly +separated — the base scan never requires pip packages. + +**Entry point:** `luminos.py` — argument parsing, scan orchestration, output routing. + +--- + +## Module Map + +| Module | Purpose | External commands | +|---|---|---| +| `luminos.py` | Entry point — arg parsing, scan(), main() | None | +| `luminos_lib/tree.py` | Recursive directory tree with file sizes | None (os) | +| `luminos_lib/filetypes.py` | Classifies files into 7 categories | `file --brief` | +| `luminos_lib/code.py` | Language detection, LOC counting, large file flagging | `wc -l` | +| `luminos_lib/recency.py` | Finds N most recently modified files | `find -printf` | +| `luminos_lib/disk.py` | Per-directory disk usage | `du -b` | +| `luminos_lib/report.py` | Formats report dict as terminal output | None | +| `luminos_lib/watch.py` | Continuous monitoring loop with snapshot diffing | None | +| `luminos_lib/capabilities.py` | Optional dependency detection, cache cleanup | None | +| `luminos_lib/cache.py` | AI investigation cache — read/write/clear/flush | None | +| `luminos_lib/ast_parser.py` | tree-sitter code structure parsing | tree-sitter | +| `luminos_lib/prompts.py` | System prompt templates for AI loops | None | +| `luminos_lib/ai.py` | Multi-pass agentic analysis via Claude API | anthropic, python-magic | + +--- + +## Base Scan Data Flow + +``` +scan(target) + build_tree() → report["tree"], report["tree_rendered"] + classify_files() → report["file_categories"], report["classified_files"] + detect_languages() → report["languages"], report["lines_of_code"] + find_large_files() → report["large_files"] + find_recent_files() → report["recent_files"] + get_disk_usage() → report["disk_usage"], report["top_directories"] + └── returns report dict +``` + +--- + +## AI Pipeline (--ai flag) + +``` +analyze_directory(report, target) + │ + ├── _discover_directories() find all dirs, sort leaves-first + │ + ├── per-directory loop (each dir, up to max_turns=14) + │ _build_dir_context() list files + sizes + │ _get_child_summaries() read cached child summaries + │ _run_dir_loop() agent loop: read files, parse structure, + │ write cache entries, submit_report + │ Tools: read_file, list_directory, + │ run_command, parse_structure, + │ write_cache, think, checkpoint, + │ flag, submit_report + │ + ├── _run_synthesis() one-shot aggregation of dir summaries + │ reads all "dir" cache entries + │ produces brief (2-4 sentences) + detailed (free-form) + │ Tools: read_cache, list_cache, flag, submit_report + │ + └── returns (brief, detailed, flags) +``` + +--- + +## Cache + +Location: `/tmp/luminos//` + +Two entry types, both stored as JSONL: + +**File entries** (`files.jsonl`): +``` +{path, relative_path, size_bytes, category, summary, notable, + notable_reason, cached_at} +``` + +**Dir entries** (`dirs.jsonl`): +``` +{path, relative_path, child_count, summary, dominant_category, + notable_files, cached_at} +``` + +**Flags** (`flags.jsonl`): +``` +{path, finding, severity} severity: info | concern | critical +``` + +Cache is reused across runs for the same target. `--fresh` ignores it. +`--clear-cache` deletes it. + +--- + +## Key Constraints + +- **Base tool: no pip dependencies.** tree, filetypes, code, disk, recency, + report, watch use only stdlib and GNU coreutils. +- **AI deps are lazy.** `anthropic`, `tree-sitter`, `python-magic` imported + only when `--ai` is used. Missing packages produce a clear install error. +- **Subprocess for OS tools.** LOC counting, file detection, disk usage, and + recency shell out to GNU coreutils. Do not reimplement in pure Python. +- **Graceful degradation everywhere.** Permission denied, subprocess timeouts, + missing API key — all handled without crashing. + +--- + +## AI Model + +`claude-sonnet-4-20250514` + +Context budget: 70% of 180,000 tokens (126,000). Early exit flushes partial +cache on budget breach. + +Pricing tracked and reported at end of each run. diff --git a/DevelopmentGuide.md b/DevelopmentGuide.md new file mode 100644 index 0000000..5ab340e --- /dev/null +++ b/DevelopmentGuide.md @@ -0,0 +1,152 @@ +# Development Guide + +## Running Luminos + +```bash +# Base scan +python3 luminos.py + +# With AI analysis (requires ANTHROPIC_API_KEY) +source ~/luminos-env/bin/activate +python3 luminos.py --ai + +# Common flags +python3 luminos.py --ai --fresh --clear-cache # force clean run +python3 luminos.py -x .git -x node_modules # exclude dirs +python3 luminos.py -d 8 -a # depth 8, include hidden +python3 luminos.py --json -o report.json # JSON output + +# Watch mode +python3 luminos.py --watch + +# Check optional dep status +python3 luminos.py --install-extras +``` + +--- + +## Optional Dependencies Setup + +```bash +# One-time setup +bash setup_env.sh + +# Or manually +python3 -m venv ~/luminos-env +source ~/luminos-env/bin/activate +pip install anthropic tree-sitter tree-sitter-python \ + tree-sitter-javascript tree-sitter-rust \ + tree-sitter-go python-magic +``` + +--- + +## Git Workflow + +Every change starts on a branch. Nothing goes directly to main. + +### Branch naming + +``` +/ +``` + +| Type | Use | +|---|---| +| `feat/` | New feature or capability | +| `fix/` | Bug fix | +| `refactor/` | Restructure without behavior change | +| `chore/` | Tooling, config, documentation | +| `test/` | Tests | + +Examples: `feat/survey-pass`, `fix/cache-flush-on-error`, `refactor/synthesis-tiers` + +### Commit messages + +``` +: +``` + +Examples: +``` +feat: add web_search tool to dir loop +fix: handle empty dir cache gracefully in synthesis +refactor: extract survey pass into _run_survey() +chore: update Architecture wiki page +``` + +One commit per logical unit of work, not one per file. + +### Merge procedure + +```bash +git checkout main +git merge --no-ff -m "merge: " +git branch -d +``` + +`--no-ff` preserves branch history. Delete branch after merging. + +--- + +## Naming Conventions + +| Context | Convention | Example | +|---|---|---| +| Functions / variables | snake_case | `classify_files`, `dir_path` | +| Classes | PascalCase | `_TokenTracker`, `_CacheManager` | +| Constants | UPPER_SNAKE_CASE | `MAX_CONTEXT`, `CACHE_ROOT` | +| Module files | snake_case | `ast_parser.py`, `filetypes.py` | +| CLI flags | kebab-case | `--clear-cache`, `--install-extras` | +| Private functions | leading underscore | `_run_synthesis`, `_build_dir_context` | + +--- + +## Project Structure + +``` +luminos/ +├── luminos.py entry point +├── luminos_lib/ +│ ├── ai.py AI pipeline (heaviest module) +│ ├── ast_parser.py tree-sitter parsing +│ ├── cache.py investigation cache management +│ ├── capabilities.py optional dep detection +│ ├── code.py language + LOC detection +│ ├── disk.py disk usage +│ ├── filetypes.py file classification +│ ├── prompts.py AI system prompt templates +│ ├── recency.py recently modified files +│ ├── report.py terminal report formatter +│ ├── tree.py directory tree +│ └── watch.py watch mode +├── docs/wiki/ local clone of Forgejo wiki (gitignored) +├── setup_env.sh venv + AI dep setup script +├── CLAUDE.md Claude Code context (thin — points to wiki) +└── PLAN.md evolution plan and design notes +``` + +--- + +## Wiki + +Wiki lives at `docs/wiki/` (gitignored — separate git repo). + +```bash +# First time +git clone ssh://git@forgejo-claude/archeious/luminos.wiki.git docs/wiki/ + +# Returning +git -C docs/wiki pull +``` + +Wiki URL: https://forgejo.labbity.unbiasedgeek.com/archeious/luminos/wiki + +When updating wiki pages: +```bash +cd docs/wiki +# edit pages +git add -A +git commit -m "wiki: " +git push +``` diff --git a/Home.md b/Home.md new file mode 100644 index 0000000..d154646 --- /dev/null +++ b/Home.md @@ -0,0 +1,44 @@ +# Luminos + +Luminos is a file system intelligence tool — a zero-dependency Python CLI that +scans a directory and produces a reconnaissance report. With `--ai` it runs a +multi-pass agentic investigation via the Claude API, producing a deep analysis +of what the directory contains and why. + +--- + +## Current State + +- **Phase:** Active development — core pipeline stable, scaling and domain intelligence planned +- **Last worked on:** 2026-04-06 +- **Last commit:** merge: add -x/--exclude flag for directory exclusion +- **Blocking:** None + +--- + +## Quick Links + +| Page | Contents | +|---|---| +| [Architecture](Architecture) | Module breakdown, data flow, AI pipeline | +| [DevelopmentGuide](DevelopmentGuide) | Git workflow, naming conventions, commands | +| [Roadmap](Roadmap) | Planned phases and open design questions | +| [SessionRetrospectives](SessionRetrospectives) | Full session history | + +--- + +## At a Glance + +```bash +python3 luminos.py # base scan +python3 luminos.py --ai # AI analysis +python3 luminos.py --ai --refine # AI + refinement pass (planned) +python3 luminos.py -x .git -x node_modules # exclude dirs +python3 luminos.py --watch # continuous monitoring +``` + +--- + +## Repository + +https://forgejo.labbity.unbiasedgeek.com/archeious/luminos diff --git a/Roadmap.md b/Roadmap.md new file mode 100644 index 0000000..4456cbc --- /dev/null +++ b/Roadmap.md @@ -0,0 +1,125 @@ +# Roadmap + +Full design notes and open questions live in `PLAN.md` in the repo root. +This page tracks phase status. + +--- + +## Core Philosophy + +Move from a **pipeline with AI steps** to **investigation driven by curiosity**. +The agent should decide what it needs to know and how to find it out — not +execute a predetermined checklist. + +--- + +## Phases + +### Phase 1 — Confidence Tracking +Add `confidence` + `confidence_reason` to file and dir cache entries. +Agent sets this when writing cache. Enables later phases to prioritize +re-investigation of uncertain entries. + +**Status:** Not started + +--- + +### Phase 2 — Survey Pass +Lightweight pre-investigation pass. Agent looks at file type distribution +and tree structure, then answers: what is this, how should I investigate it, +which tools are relevant? + +Replaces hardcoded domain detection with AI-driven characterization. +Survey output injected into dir loop system prompts as context. + +**Status:** Not started + +--- + +### Phase 3 — Investigation Planning +After survey, a planning pass allocates investigation depth per directory. +Replaces fixed max_turns-per-dir with a global turn budget the agent manages. +Priority dirs get more turns; trivial dirs get fewer; generated/vendored dirs +get skipped. + +**Status:** Not started + +--- + +### Phase 4 — External Knowledge Tools +Resolution strategies for uncertainty beyond local files: +- `web_search` — unfamiliar library, format, API +- `package_lookup` — PyPI / npm / crates.io metadata +- `fetch_url` — follow URLs referenced in local files +- `ask_user` — interactive mode, last resort + +All gated behind `--no-external` flag. Budget-limited per session. + +**Status:** Not started + +--- + +### Phase 5 — Scale-Tiered Synthesis +Calibrate synthesis input and depth to target size: + +| Tier | Size | Approach | +|---|---|---| +| small | <5 dirs / <30 files | Per-file cache entries as synthesis input | +| medium | 5–30 dirs | Dir summaries (current) | +| large | 31–150 dirs | Multi-level synthesis | +| xlarge | >150 dirs | Multi-level + subsystem grouping | + +**Status:** Not started + +--- + +### Phase 6 — Multi-Level Synthesis +For large/xlarge: grouping pass identifies logical subsystems from dir +summaries (not directory structure). Final synthesis receives 3–10 subsystem +summaries rather than hundreds of dir summaries. + +**Status:** Not started + +--- + +### Phase 7 — Hypothesis-Driven Synthesis +Synthesis reframed from aggregation to conclusion-with-evidence. Agent +forms a hypothesis, looks for confirming/refuting evidence, considers +alternatives, then submits. + +Produces analytical output rather than descriptive output. + +**Status:** Not started + +--- + +### Phase 8 — Refinement Pass +Post-synthesis targeted re-investigation. Agent receives current synthesis, +identifies gaps and contradictions, goes back to actual files (or external +sources), submits improved report. + +Triggered by `--refine` flag. `--refine-depth N` for multiple passes. + +**Status:** Not started + +--- + +### Phase 9 — Dynamic Report Structure +Synthesis produces a superset of possible output fields; report formatter +renders only populated ones. Output naturally scales from minimal (small +simple targets) to comprehensive (large complex targets). + +**Status:** Not started + +--- + +## Open Design Questions + +See `PLAN.md` — Known Unknowns and Concerns sections. + +Key unresolved items: +- Which search API to use for web_search +- Whether external tools should be opt-in or opt-out by default +- How to handle confidence calibration (numeric vs categorical) +- Config file format and location for tunable thresholds +- Progressive output / interactive mode UX design diff --git a/Session-1.md b/Session-1.md new file mode 100644 index 0000000..d43b9c6 --- /dev/null +++ b/Session-1.md @@ -0,0 +1,56 @@ +# Session 1 — 2026-04-06 + +## What Was Shipped + +### Scan progress output +Added `[scan]` step reporting to stderr for all base scan steps. Previously +the tool was silent until the report appeared. + +### In-place per-file progress display +File-iterating steps (classify, count lines, check large files) now update +a single line in-place using `\r` + ANSI clear-to-EOL rather than scrolling. +Modules gained optional `on_file` callbacks; `luminos.py` wires them up via +a `_progress(label)` helper. + +### `--exclude` / `-x` flag +Exclude directories by name from all scan steps and AI analysis. Repeatable: +`-x .git -x node_modules`. Propagated through tree, filetypes, recency, disk, +and ai._discover_directories. + +### Forgejo project +Created repo at https://forgejo.labbity.unbiasedgeek.com/archeious/luminos. +Pushed full history. + +### PLAN.md +Detailed evolution plan covering: +- AI-driven domain detection (replaces hardcoded taxonomy) +- Scale-tiered synthesis (small/medium/large/xlarge) +- Multi-level synthesis for large repos +- Uncertainty as first-class concept with resolution strategies +- External knowledge tools (web search, package lookup, URL fetch, ask_user) +- Investigation planning pass +- Hypothesis-driven synthesis +- Refinement pass (`--refine`) +- Known unknowns, concerns, raw thoughts + +### Wiki + development practices +Initialized Forgejo wiki. Created: Home, Architecture, DevelopmentGuide, +Roadmap, SessionRetrospectives. Rewrote CLAUDE.md to follow harbormind's +thin-CLAUDE.md + wiki pattern. + +--- + +## Commits + +- `feat: add progress output to base scan steps` +- `feat: in-place per-file progress for classify, count, and large-file steps` +- `feat: add -x/--exclude flag to exclude directories from scan and AI analysis` + +--- + +## State at End of Session + +- main is clean, all features merged +- PLAN.md written, no implementation started +- Wiki initialized, all pages current +- No blocking issues diff --git a/SessionRetrospectives.md b/SessionRetrospectives.md new file mode 100644 index 0000000..4ab37cd --- /dev/null +++ b/SessionRetrospectives.md @@ -0,0 +1,9 @@ +# Session Retrospectives + +| Session | Date | Summary | +|---|---|---| +| [Session 1](Session-1) | 2026-04-06 | Project setup, scan improvements, Forgejo repo, wiki, development practices | + +--- + +Full session notes linked above.