init: Home, Architecture, DevelopmentGuide, Roadmap, SessionRetrospectives, Session-1

2026-04-06 16:11:05 -06:00 · 2026-04-06 16:11:05 -06:00 · fbc73406a6
commit fbc73406a6
6 changed files with 509 additions and 0 deletions
--- a/Architecture.md
+++ b/Architecture.md
@ -0,0 +1,123 @@
+# Architecture
+
+## Overview
+
+Luminos is a zero-dependency Python CLI at its base. The `--ai` flag layers an
+agentic investigation on top using the Claude API. The two layers are strictly
+separated — the base scan never requires pip packages.
+
+**Entry point:** `luminos.py` — argument parsing, scan orchestration, output routing.
+
+---
+
+## Module Map
+
+| Module | Purpose | External commands |
+|---|---|---|
+| `luminos.py` | Entry point — arg parsing, scan(), main() | None |
+| `luminos_lib/tree.py` | Recursive directory tree with file sizes | None (os) |
+| `luminos_lib/filetypes.py` | Classifies files into 7 categories | `file --brief` |
+| `luminos_lib/code.py` | Language detection, LOC counting, large file flagging | `wc -l` |
+| `luminos_lib/recency.py` | Finds N most recently modified files | `find -printf` |
+| `luminos_lib/disk.py` | Per-directory disk usage | `du -b` |
+| `luminos_lib/report.py` | Formats report dict as terminal output | None |
+| `luminos_lib/watch.py` | Continuous monitoring loop with snapshot diffing | None |
+| `luminos_lib/capabilities.py` | Optional dependency detection, cache cleanup | None |
+| `luminos_lib/cache.py` | AI investigation cache — read/write/clear/flush | None |
+| `luminos_lib/ast_parser.py` | tree-sitter code structure parsing | tree-sitter |
+| `luminos_lib/prompts.py` | System prompt templates for AI loops | None |
+| `luminos_lib/ai.py` | Multi-pass agentic analysis via Claude API | anthropic, python-magic |
+
+---
+
+## Base Scan Data Flow
+
+```
+scan(target)
+    build_tree()           → report["tree"], report["tree_rendered"]
+    classify_files()       → report["file_categories"], report["classified_files"]
+    detect_languages()     → report["languages"], report["lines_of_code"]
+    find_large_files()     → report["large_files"]
+    find_recent_files()    → report["recent_files"]
+    get_disk_usage()       → report["disk_usage"], report["top_directories"]
+    └── returns report dict
+```
+
+---
+
+## AI Pipeline (--ai flag)
+
+```
+analyze_directory(report, target)
+    │
+    ├── _discover_directories()     find all dirs, sort leaves-first
+    │
+    ├── per-directory loop (each dir, up to max_turns=14)
+    │       _build_dir_context()    list files + sizes
+    │       _get_child_summaries()  read cached child summaries
+    │       _run_dir_loop()         agent loop: read files, parse structure,
+    │                               write cache entries, submit_report
+    │                               Tools: read_file, list_directory,
+    │                               run_command, parse_structure,
+    │                               write_cache, think, checkpoint,
+    │                               flag, submit_report
+    │
+    ├── _run_synthesis()            one-shot aggregation of dir summaries
+    │       reads all "dir" cache entries
+    │       produces brief (2-4 sentences) + detailed (free-form)
+    │       Tools: read_cache, list_cache, flag, submit_report
+    │
+    └── returns (brief, detailed, flags)
+```
+
+---
+
+## Cache
+
+Location: `/tmp/luminos/<investigation_id>/`
+
+Two entry types, both stored as JSONL:
+
+**File entries** (`files.jsonl`):
+```
+{path, relative_path, size_bytes, category, summary, notable,
+ notable_reason, cached_at}
+```
+
+**Dir entries** (`dirs.jsonl`):
+```
+{path, relative_path, child_count, summary, dominant_category,
+ notable_files, cached_at}
+```
+
+**Flags** (`flags.jsonl`):
+```
+{path, finding, severity}   severity: info | concern | critical
+```
+
+Cache is reused across runs for the same target. `--fresh` ignores it.
+`--clear-cache` deletes it.
+
+---
+
+## Key Constraints
+
+- **Base tool: no pip dependencies.** tree, filetypes, code, disk, recency,
+  report, watch use only stdlib and GNU coreutils.
+- **AI deps are lazy.** `anthropic`, `tree-sitter`, `python-magic` imported
+  only when `--ai` is used. Missing packages produce a clear install error.
+- **Subprocess for OS tools.** LOC counting, file detection, disk usage, and
+  recency shell out to GNU coreutils. Do not reimplement in pure Python.
+- **Graceful degradation everywhere.** Permission denied, subprocess timeouts,
+  missing API key — all handled without crashing.
+
+---
+
+## AI Model
+
+`claude-sonnet-4-20250514`
+
+Context budget: 70% of 180,000 tokens (126,000). Early exit flushes partial
+cache on budget breach.
+
+Pricing tracked and reported at end of each run.
--- a/DevelopmentGuide.md
+++ b/DevelopmentGuide.md
@ -0,0 +1,152 @@
+# Development Guide
+
+## Running Luminos
+
+```bash
+# Base scan
+python3 luminos.py <target>
+
+# With AI analysis (requires ANTHROPIC_API_KEY)
+source ~/luminos-env/bin/activate
+python3 luminos.py --ai <target>
+
+# Common flags
+python3 luminos.py --ai --fresh --clear-cache <target>   # force clean run
+python3 luminos.py -x .git -x node_modules <target>      # exclude dirs
+python3 luminos.py -d 8 -a <target>                       # depth 8, include hidden
+python3 luminos.py --json -o report.json <target>         # JSON output
+
+# Watch mode
+python3 luminos.py --watch <target>
+
+# Check optional dep status
+python3 luminos.py --install-extras
+```
+
+---
+
+## Optional Dependencies Setup
+
+```bash
+# One-time setup
+bash setup_env.sh
+
+# Or manually
+python3 -m venv ~/luminos-env
+source ~/luminos-env/bin/activate
+pip install anthropic tree-sitter tree-sitter-python \
+            tree-sitter-javascript tree-sitter-rust \
+            tree-sitter-go python-magic
+```
+
+---
+
+## Git Workflow
+
+Every change starts on a branch. Nothing goes directly to main.
+
+### Branch naming
+
+```
+<type>/<short-description>
+```
+
+| Type | Use |
+|---|---|
+| `feat/` | New feature or capability |
+| `fix/` | Bug fix |
+| `refactor/` | Restructure without behavior change |
+| `chore/` | Tooling, config, documentation |
+| `test/` | Tests |
+
+Examples: `feat/survey-pass`, `fix/cache-flush-on-error`, `refactor/synthesis-tiers`
+
+### Commit messages
+
+```
+<type>: <short description>
+```
+
+Examples:
+```
+feat: add web_search tool to dir loop
+fix: handle empty dir cache gracefully in synthesis
+refactor: extract survey pass into _run_survey()
+chore: update Architecture wiki page
+```
+
+One commit per logical unit of work, not one per file.
+
+### Merge procedure
+
+```bash
+git checkout main
+git merge --no-ff <branch> -m "merge: <description>"
+git branch -d <branch>
+```
+
+`--no-ff` preserves branch history. Delete branch after merging.
+
+---
+
+## Naming Conventions
+
+| Context | Convention | Example |
+|---|---|---|
+| Functions / variables | snake_case | `classify_files`, `dir_path` |
+| Classes | PascalCase | `_TokenTracker`, `_CacheManager` |
+| Constants | UPPER_SNAKE_CASE | `MAX_CONTEXT`, `CACHE_ROOT` |
+| Module files | snake_case | `ast_parser.py`, `filetypes.py` |
+| CLI flags | kebab-case | `--clear-cache`, `--install-extras` |
+| Private functions | leading underscore | `_run_synthesis`, `_build_dir_context` |
+
+---
+
+## Project Structure
+
+```
+luminos/
+├── luminos.py              entry point
+├── luminos_lib/
+│   ├── ai.py               AI pipeline (heaviest module)
+│   ├── ast_parser.py       tree-sitter parsing
+│   ├── cache.py            investigation cache management
+│   ├── capabilities.py     optional dep detection
+│   ├── code.py             language + LOC detection
+│   ├── disk.py             disk usage
+│   ├── filetypes.py        file classification
+│   ├── prompts.py          AI system prompt templates
+│   ├── recency.py          recently modified files
+│   ├── report.py           terminal report formatter
+│   ├── tree.py             directory tree
+│   └── watch.py            watch mode
+├── docs/wiki/              local clone of Forgejo wiki (gitignored)
+├── setup_env.sh            venv + AI dep setup script
+├── CLAUDE.md               Claude Code context (thin — points to wiki)
+└── PLAN.md                 evolution plan and design notes
+```
+
+---
+
+## Wiki
+
+Wiki lives at `docs/wiki/` (gitignored — separate git repo).
+
+```bash
+# First time
+git clone ssh://git@forgejo-claude/archeious/luminos.wiki.git docs/wiki/
+
+# Returning
+git -C docs/wiki pull
+```
+
+Wiki URL: https://forgejo.labbity.unbiasedgeek.com/archeious/luminos/wiki
+
+When updating wiki pages:
+```bash
+cd docs/wiki
+# edit pages
+git add -A
+git commit -m "wiki: <description>"
+git push
+```
--- a/Home.md
+++ b/Home.md
@ -0,0 +1,44 @@
+# Luminos
+
+Luminos is a file system intelligence tool — a zero-dependency Python CLI that
+scans a directory and produces a reconnaissance report. With `--ai` it runs a
+multi-pass agentic investigation via the Claude API, producing a deep analysis
+of what the directory contains and why.
+
+---
+
+## Current State
+
+- **Phase:** Active development — core pipeline stable, scaling and domain intelligence planned
+- **Last worked on:** 2026-04-06
+- **Last commit:** merge: add -x/--exclude flag for directory exclusion
+- **Blocking:** None
+
+---
+
+## Quick Links
+
+| Page | Contents |
+|---|---|
+| [Architecture](Architecture) | Module breakdown, data flow, AI pipeline |
+| [DevelopmentGuide](DevelopmentGuide) | Git workflow, naming conventions, commands |
+| [Roadmap](Roadmap) | Planned phases and open design questions |
+| [SessionRetrospectives](SessionRetrospectives) | Full session history |
+
+---
+
+## At a Glance
+
+```bash
+python3 luminos.py <target>                  # base scan
+python3 luminos.py --ai <target>             # AI analysis
+python3 luminos.py --ai --refine <target>    # AI + refinement pass (planned)
+python3 luminos.py -x .git -x node_modules <target>   # exclude dirs
+python3 luminos.py --watch <target>          # continuous monitoring
+```
+
+---
+
+## Repository
+
+https://forgejo.labbity.unbiasedgeek.com/archeious/luminos
--- a/Roadmap.md
+++ b/Roadmap.md
@ -0,0 +1,125 @@
+# Roadmap
+
+Full design notes and open questions live in `PLAN.md` in the repo root.
+This page tracks phase status.
+
+---
+
+## Core Philosophy
+
+Move from a **pipeline with AI steps** to **investigation driven by curiosity**.
+The agent should decide what it needs to know and how to find it out — not
+execute a predetermined checklist.
+
+---
+
+## Phases
+
+### Phase 1 — Confidence Tracking
+Add `confidence` + `confidence_reason` to file and dir cache entries.
+Agent sets this when writing cache. Enables later phases to prioritize
+re-investigation of uncertain entries.
+
+**Status:** Not started
+
+---
+
+### Phase 2 — Survey Pass
+Lightweight pre-investigation pass. Agent looks at file type distribution
+and tree structure, then answers: what is this, how should I investigate it,
+which tools are relevant?
+
+Replaces hardcoded domain detection with AI-driven characterization.
+Survey output injected into dir loop system prompts as context.
+
+**Status:** Not started
+
+---
+
+### Phase 3 — Investigation Planning
+After survey, a planning pass allocates investigation depth per directory.
+Replaces fixed max_turns-per-dir with a global turn budget the agent manages.
+Priority dirs get more turns; trivial dirs get fewer; generated/vendored dirs
+get skipped.
+
+**Status:** Not started
+
+---
+
+### Phase 4 — External Knowledge Tools
+Resolution strategies for uncertainty beyond local files:
+- `web_search` — unfamiliar library, format, API
+- `package_lookup` — PyPI / npm / crates.io metadata
+- `fetch_url` — follow URLs referenced in local files
+- `ask_user` — interactive mode, last resort
+
+All gated behind `--no-external` flag. Budget-limited per session.
+
+**Status:** Not started
+
+---
+
+### Phase 5 — Scale-Tiered Synthesis
+Calibrate synthesis input and depth to target size:
+
+| Tier | Size | Approach |
+|---|---|---|
+| small | <5 dirs / <30 files | Per-file cache entries as synthesis input |
+| medium | 5–30 dirs | Dir summaries (current) |
+| large | 31–150 dirs | Multi-level synthesis |
+| xlarge | >150 dirs | Multi-level + subsystem grouping |
+
+**Status:** Not started
+
+---
+
+### Phase 6 — Multi-Level Synthesis
+For large/xlarge: grouping pass identifies logical subsystems from dir
+summaries (not directory structure). Final synthesis receives 3–10 subsystem
+summaries rather than hundreds of dir summaries.
+
+**Status:** Not started
+
+---
+
+### Phase 7 — Hypothesis-Driven Synthesis
+Synthesis reframed from aggregation to conclusion-with-evidence. Agent
+forms a hypothesis, looks for confirming/refuting evidence, considers
+alternatives, then submits.
+
+Produces analytical output rather than descriptive output.
+
+**Status:** Not started
+
+---
+
+### Phase 8 — Refinement Pass
+Post-synthesis targeted re-investigation. Agent receives current synthesis,
+identifies gaps and contradictions, goes back to actual files (or external
+sources), submits improved report.
+
+Triggered by `--refine` flag. `--refine-depth N` for multiple passes.
+
+**Status:** Not started
+
+---
+
+### Phase 9 — Dynamic Report Structure
+Synthesis produces a superset of possible output fields; report formatter
+renders only populated ones. Output naturally scales from minimal (small
+simple targets) to comprehensive (large complex targets).
+
+**Status:** Not started
+
+---
+
+## Open Design Questions
+
+See `PLAN.md` — Known Unknowns and Concerns sections.
+
+Key unresolved items:
+- Which search API to use for web_search
+- Whether external tools should be opt-in or opt-out by default
+- How to handle confidence calibration (numeric vs categorical)
+- Config file format and location for tunable thresholds
+- Progressive output / interactive mode UX design
--- a/Session-1.md
+++ b/Session-1.md
@ -0,0 +1,56 @@
+# Session 1 — 2026-04-06
+
+## What Was Shipped
+
+### Scan progress output
+Added `[scan]` step reporting to stderr for all base scan steps. Previously
+the tool was silent until the report appeared.
+
+### In-place per-file progress display
+File-iterating steps (classify, count lines, check large files) now update
+a single line in-place using `\r` + ANSI clear-to-EOL rather than scrolling.
+Modules gained optional `on_file` callbacks; `luminos.py` wires them up via
+a `_progress(label)` helper.
+
+### `--exclude` / `-x` flag
+Exclude directories by name from all scan steps and AI analysis. Repeatable:
+`-x .git -x node_modules`. Propagated through tree, filetypes, recency, disk,
+and ai._discover_directories.
+
+### Forgejo project
+Created repo at https://forgejo.labbity.unbiasedgeek.com/archeious/luminos.
+Pushed full history.
+
+### PLAN.md
+Detailed evolution plan covering:
+- AI-driven domain detection (replaces hardcoded taxonomy)
+- Scale-tiered synthesis (small/medium/large/xlarge)
+- Multi-level synthesis for large repos
+- Uncertainty as first-class concept with resolution strategies
+- External knowledge tools (web search, package lookup, URL fetch, ask_user)
+- Investigation planning pass
+- Hypothesis-driven synthesis
+- Refinement pass (`--refine`)
+- Known unknowns, concerns, raw thoughts
+
+### Wiki + development practices
+Initialized Forgejo wiki. Created: Home, Architecture, DevelopmentGuide,
+Roadmap, SessionRetrospectives. Rewrote CLAUDE.md to follow harbormind's
+thin-CLAUDE.md + wiki pattern.
+
+---
+
+## Commits
+
+- `feat: add progress output to base scan steps`
+- `feat: in-place per-file progress for classify, count, and large-file steps`
+- `feat: add -x/--exclude flag to exclude directories from scan and AI analysis`
+
+---
+
+## State at End of Session
+
+- main is clean, all features merged
+- PLAN.md written, no implementation started
+- Wiki initialized, all pages current
+- No blocking issues
--- a/SessionRetrospectives.md
+++ b/SessionRetrospectives.md
@ -0,0 +1,9 @@
+# Session Retrospectives
+
+| Session | Date | Summary |
+|---|---|---|
+| [Session 1](Session-1) | 2026-04-06 | Project setup, scan improvements, Forgejo repo, wiki, development practices |
+
+---
+
+Full session notes linked above.