luminos

Author	SHA1	Message	Date
Jeff Smith	2e3d21f774	feat(ai): wire survey output into dir loop (#6 ) The survey pass now actually steers dir loop behavior, in two ways: 1. Prompt injection: a new {survey_context} placeholder in _DIR_SYSTEM_PROMPT receives the survey description, approach, domain_notes, relevant_tools, and skip_tools so the dir-loop agent has investigation context before its first turn. 2. Tool schema filtering: _filter_dir_tools() removes any tool listed in skip_tools from the schema passed to the API, gated on survey confidence >= 0.5. Control-flow tools (submit_report) are always preserved. This is hard enforcement — the agent literally cannot call a filtered tool, which the smoke test for #5 showed was necessary (prompt-only guidance was ignored). Smoke test on luminos_lib: zero run_command invocations (vs 2 before), context budget no longer exhausted (87k vs 133k), cost ~$0.34 (vs $0.46), investigation completes instead of early-exiting. Adds tests/test_ai_filter.py with 14 tests covering _filter_dir_tools and _format_survey_block — both pure helpers, no live API needed.	2026-04-06 22:07:12 -06:00
Jeff Smith	fecb24d6e1	feat(ai): add _run_survey() and submit_survey tool (#5 ) Adds the reconnaissance survey pass: a fast, ≤3-turn LLM call that characterizes the target before any directory investigation begins. The survey receives the file-type distribution (from the base scan), a top-2-level tree preview, and the list of available dir-loop tools, and returns description / approach / relevant_tools / skip_tools / domain_notes / confidence via a single submit_survey tool call. Wired into _run_investigation() before the directory loop. Output is logged but not yet consumed — that wiring is #6. Survey failure is non-fatal: if the call errors or runs out of turns, the investigation proceeds without survey context. Also adds a Band-Aid to _SURVEY_SYSTEM_PROMPT warning the LLM that the file-type histogram is biased toward source code (the underlying classifier has no concept of mail, notebooks, ledgers, etc.) and to trust the tree preview when they conflict. The proper fix is #42.	2026-04-06 21:49:59 -06:00
Jeff Smith	78f9a396dd	feat: add -x/--exclude flag to exclude directories from scan and AI analysis	2026-04-06 14:32:12 -06:00
Jeff Smith	ea8c07a692	refactor: extract system prompts into luminos_lib/prompts.py Moves _DIR_SYSTEM_PROMPT and _SYNTHESIS_SYSTEM_PROMPT from ai.py into a dedicated prompts module. Both are pure template strings with .format() placeholders — no runtime imports needed in prompts.py. Prompt content is byte-for-byte identical to the original. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 14:44:45 -06:00
Jeff Smith	0c49da23ab	refactor: extract AST parsing into luminos_lib/ast_parser.py Moves all tree-sitter parsing logic from ai.py into a dedicated module. Replaces the if/elif language chain with a _LANGUAGE_HANDLERS registry mapping language names to handler functions. Extracted: _tool_parse_structure body, _get_ts_parser, _child_by_type, _text, and all per-language helpers (_py_func_sig, _py_class, etc.). ai.py retains a thin wrapper for path validation. Public API: parse_structure(path) -> JSON string Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 14:34:02 -06:00
Jeff Smith	dceff144b6	chore: remove dead clear_cache from ai.py, deduplicate CACHE_ROOT - Delete unused clear_cache() from ai.py (luminos.py imports it from capabilities.py) - Remove CACHE_ROOT import from ai.py (was only used by dead function) - Replace local CACHE_ROOT constant in capabilities.py with import from cache.py (single source of truth) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 13:52:38 -06:00
Jeff Smith	bbd04f41a7	refactor: extract cache management into luminos_lib/cache.py Moves investigation ID persistence and _CacheManager class from ai.py into a dedicated cache module. No behavior changes. Moved: _load_investigations, _save_investigations, _get_investigation_id, _CacheManager (all methods), _sha256_path, CACHE_ROOT, INVESTIGATIONS_PATH. Also added a local _now_iso() in cache.py to avoid a circular import (ai.py imports from cache.py). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 13:12:37 -06:00
Jeff Smith	f324648c10	feat: add chain-of-thought observability tools Adds think, checkpoint, and flag tools for agent reasoning visibility: - think: records observation/hypothesis/next_action before investigation - checkpoint: summarizes learned/unknown/next_phase after file clusters - flag: marks notable findings to flags.jsonl with severity levels Additional changes: - Step numbering in investigation system prompt - Text blocks from agent now printed to stderr (step labels visible) - flag tool available in both investigation and synthesis passes - analyze_directory() returns (brief, detailed, flags) three-tuple - format_flags() in report.py renders flags sorted by severity - Per-directory max_turns increased from 10 to 14 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 13:02:19 -06:00
Jeff Smith	2e2c64386f	fix: flush partial directory cache on context budget early exit When the 70% context budget is hit mid-directory, the early exit now writes a partial directory cache entry from whatever file summaries the agent cached in prior turns, instead of discarding the work. If file entries exist: concatenates their summaries into a directory entry marked partial=true. If no files were cached: writes a minimal entry noting the budget was reached before processing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 12:17:28 -06:00
Jeff Smith	907dcf0a37	refactor: replace single-shot API with multi-pass agentic investigation Rewrites ai.py from a single Claude API call into a multi-pass, cache-driven agent architecture: - Per-directory isolated agent loops (max 10 turns each) with context discarded between directories - Leaves-first processing order so child summaries inform parents - Disk cache (/tmp/luminos/{uuid}/) persists across runs for resumability - Investigation ID persistence keyed by target realpath - Separate synthesis pass reads only directory-level cache entries - Replaces urllib with Anthropic SDK (streaming, automatic retries) - Token counting with 70% context budget threshold for early exit - parse_structure tool via tree-sitter (Python, JS, Rust, Go) - python-magic integration for MIME-aware directory listings - Cost tracking printed at end of investigation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 12:13:55 -06:00
Jeff Smith	bcf7d12b4a	feat: add AI-powered directory analysis via Claude API Adds --ai flag that sends the directory tree, file categories, and sampled file contents to Claude for analysis. Produces a brief summary at the top of the report and a detailed breakdown at the end. Requires ANTHROPIC_API_KEY env var; degrades gracefully without it. Uses only stdlib (urllib) to keep the zero-dependency constraint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 10:03:48 -06:00

11 commits