Adds the reconnaissance survey pass: a fast, ≤3-turn LLM call that
characterizes the target before any directory investigation begins.
The survey receives the file-type distribution (from the base scan),
a top-2-level tree preview, and the list of available dir-loop tools,
and returns description / approach / relevant_tools / skip_tools /
domain_notes / confidence via a single submit_survey tool call.
Wired into _run_investigation() before the directory loop. Output is
logged but not yet consumed — that wiring is #6. Survey failure is
non-fatal: if the call errors or runs out of turns, the investigation
proceeds without survey context.
Also adds a Band-Aid to _SURVEY_SYSTEM_PROMPT warning the LLM that
the file-type histogram is biased toward source code (the underlying
classifier has no concept of mail, notebooks, ledgers, etc.) and to
trust the tree preview when they conflict. The proper fix is #42.
Adds the system prompt for the survey reconnaissance pass. The survey
agent answers three questions (what is this, what approach, which tools
matter) from cheap signals — file type distribution and a top-2-level
tree — without reading files. Tool triage is tri-state: relevant, skip,
or unlisted (default), so skip is reserved for tools whose use would be
actively wrong rather than merely unnecessary.
Wiring of _run_survey() and the submit_survey tool follows in #5.
Returns all file and dir cache entries with confidence below a given
threshold (default 0.7). Entries missing a confidence field are
included as unrated/untrusted. Results sorted ascending by confidence
so least-confident entries come first.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add confidence and confidence_reason to both cache schemas in the dir
loop prompt. Add a Confidence section with categorical guidance
(high ≥ 0.8, medium 0.5–0.8, low < 0.5) and the rule to include
confidence_reason when confidence is below 0.7.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add optional confidence (float 0.0–1.0) and confidence_reason (str) fields
to both file and dir cache entries. Validation rejects out-of-range values
and wrong types. Fields are not yet required — pure schema instrumentation
for Phase 1.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Moves _DIR_SYSTEM_PROMPT and _SYNTHESIS_SYSTEM_PROMPT from ai.py into
a dedicated prompts module. Both are pure template strings with .format()
placeholders — no runtime imports needed in prompts.py. Prompt content
is byte-for-byte identical to the original.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Moves all tree-sitter parsing logic from ai.py into a dedicated module.
Replaces the if/elif language chain with a _LANGUAGE_HANDLERS registry
mapping language names to handler functions.
Extracted: _tool_parse_structure body, _get_ts_parser, _child_by_type,
_text, and all per-language helpers (_py_func_sig, _py_class, etc.).
ai.py retains a thin wrapper for path validation.
Public API: parse_structure(path) -> JSON string
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Delete unused clear_cache() from ai.py (luminos.py imports it from
capabilities.py)
- Remove CACHE_ROOT import from ai.py (was only used by dead function)
- Replace local CACHE_ROOT constant in capabilities.py with import
from cache.py (single source of truth)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Moves investigation ID persistence and _CacheManager class from ai.py
into a dedicated cache module. No behavior changes.
Moved: _load_investigations, _save_investigations, _get_investigation_id,
_CacheManager (all methods), _sha256_path, CACHE_ROOT, INVESTIGATIONS_PATH.
Also added a local _now_iso() in cache.py to avoid a circular import
(ai.py imports from cache.py).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds think, checkpoint, and flag tools for agent reasoning visibility:
- think: records observation/hypothesis/next_action before investigation
- checkpoint: summarizes learned/unknown/next_phase after file clusters
- flag: marks notable findings to flags.jsonl with severity levels
Additional changes:
- Step numbering in investigation system prompt
- Text blocks from agent now printed to stderr (step labels visible)
- flag tool available in both investigation and synthesis passes
- analyze_directory() returns (brief, detailed, flags) three-tuple
- format_flags() in report.py renders flags sorted by severity
- Per-directory max_turns increased from 10 to 14
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the 70% context budget is hit mid-directory, the early exit now
writes a partial directory cache entry from whatever file summaries
the agent cached in prior turns, instead of discarding the work.
If file entries exist: concatenates their summaries into a directory
entry marked partial=true. If no files were cached: writes a minimal
entry noting the budget was reached before processing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrites ai.py from a single Claude API call into a multi-pass,
cache-driven agent architecture:
- Per-directory isolated agent loops (max 10 turns each) with context
discarded between directories
- Leaves-first processing order so child summaries inform parents
- Disk cache (/tmp/luminos/{uuid}/) persists across runs for resumability
- Investigation ID persistence keyed by target realpath
- Separate synthesis pass reads only directory-level cache entries
- Replaces urllib with Anthropic SDK (streaming, automatic retries)
- Token counting with 70% context budget threshold for early exit
- parse_structure tool via tree-sitter (Python, JS, Rust, Go)
- python-magic integration for MIME-aware directory listings
- Cost tracking printed at end of investigation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces luminos_lib/capabilities.py as the single source of truth for
optional package availability. Detects anthropic, tree-sitter, python-magic
and their grammar packages. Provides check_ai_dependencies() for gating
--ai mode and print_status() for --install-extras. Also hosts clear_cache()
to avoid pulling heavy AI imports for cache cleanup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds --ai flag that sends the directory tree, file categories, and
sampled file contents to Claude for analysis. Produces a brief
summary at the top of the report and a detailed breakdown at the
end. Requires ANTHROPIC_API_KEY env var; degrades gracefully without it.
Uses only stdlib (urllib) to keep the zero-dependency constraint.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Re-scans every 30 seconds and shows new files, deleted files, and
size changes between scans.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Human-readable terminal report with clear sections, plus JSON output
mode and file output support.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Uses du to show per-directory disk usage and highlights the top 5
largest directories.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Finds the 10 most recently modified files using find with printf
and shows human-readable timestamps.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detects programming languages, counts lines of code per language via
wc -l, and flags unusually large files (>1000 lines or >10MB).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Classifies files by category (source, config, data, media, document,
archive, unknown) using extension mapping and the `file` command.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Renders a visual tree with file sizes, configurable depth, and
hidden file filtering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>