Commit graph

56 commits

Author SHA1 Message Date
Jeff Smith
6cda1cc521 docs(plan): defer #46 to end-of-project tuning section 2026-04-06 22:20:54 -06:00
Jeff Smith
896dac686d merge: feat/issue-7-survey-min-size (#7) 2026-04-06 22:19:35 -06:00
Jeff Smith
8fb2f90678 feat(ai): skip survey pass for tiny targets (#7)
Adds a gate in _run_investigation that skips the survey API call when
a target has both fewer than _SURVEY_MIN_FILES (5) files AND fewer
than _SURVEY_MIN_DIRS (2) directories. AND semantics handle the
deep-narrow edge case correctly: a target with 4 files spread across
50 directories still gets a survey because dir count amortizes the
cost across 50 dir loops.

When skipped, _default_survey() supplies a synthetic dict with
confidence=0.0 — chosen specifically so _filter_dir_tools() never
enforces skip_tools from a synthetic value. The dir loop receives
a generic "small target, read everything" framing in its prompt and
keeps its full toolbox.

Reorders _discover_directories() to run before the survey gate so
total_dirs is available without a second walk.

#46 tracks revisiting the threshold values with empirical data after
Phase 2 ships and we've run --ai on a variety of real targets.

Smoke tested on a 2-file target: gate triggers, default survey
substituted, dir loop completes normally. Adds 4 unit tests for
_default_survey() covering schema, confidence guard, filter
interaction, and empty skip_tools.
2026-04-06 22:19:25 -06:00
Jeff Smith
b2d00dd301 merge: feat/issue-6-wire-survey (#6) 2026-04-06 22:07:22 -06:00
Jeff Smith
2e3d21f774 feat(ai): wire survey output into dir loop (#6)
The survey pass now actually steers dir loop behavior, in two ways:

1. Prompt injection: a new {survey_context} placeholder in
   _DIR_SYSTEM_PROMPT receives the survey description, approach,
   domain_notes, relevant_tools, and skip_tools so the dir-loop agent
   has investigation context before its first turn.

2. Tool schema filtering: _filter_dir_tools() removes any tool listed
   in skip_tools from the schema passed to the API, gated on
   survey confidence >= 0.5. Control-flow tools (submit_report) are
   always preserved. This is hard enforcement — the agent literally
   cannot call a filtered tool, which the smoke test for #5 showed
   was necessary (prompt-only guidance was ignored).

Smoke test on luminos_lib: zero run_command invocations (vs 2 before),
context budget no longer exhausted (87k vs 133k), cost ~$0.34 (vs
$0.46), investigation completes instead of early-exiting.

Adds tests/test_ai_filter.py with 14 tests covering _filter_dir_tools
and _format_survey_block — both pure helpers, no live API needed.
2026-04-06 22:07:12 -06:00
Jeff Smith
e942ecc34a docs(plan): add Phase 2.5 context budget reliability (#44)
#5 smoke test showed the dir loop exhausts the 126k context budget on
a 13-file Python lib. Sequencing #44 between Phase 2 and Phase 3 so
the foundation is solid before planning + external tools add more
prompt and tool weight.
2026-04-06 21:59:01 -06:00
Jeff Smith
ffd9d9e929 merge: feat/issue-5-run-survey (#5) 2026-04-06 21:50:08 -06:00
Jeff Smith
fecb24d6e1 feat(ai): add _run_survey() and submit_survey tool (#5)
Adds the reconnaissance survey pass: a fast, ≤3-turn LLM call that
characterizes the target before any directory investigation begins.
The survey receives the file-type distribution (from the base scan),
a top-2-level tree preview, and the list of available dir-loop tools,
and returns description / approach / relevant_tools / skip_tools /
domain_notes / confidence via a single submit_survey tool call.

Wired into _run_investigation() before the directory loop. Output is
logged but not yet consumed — that wiring is #6. Survey failure is
non-fatal: if the call errors or runs out of turns, the investigation
proceeds without survey context.

Also adds a Band-Aid to _SURVEY_SYSTEM_PROMPT warning the LLM that
the file-type histogram is biased toward source code (the underlying
classifier has no concept of mail, notebooks, ledgers, etc.) and to
trust the tree preview when they conflict. The proper fix is #42.
2026-04-06 21:49:59 -06:00
Jeff Smith
05fcaac755 docs(plan): note classifier rebuild (#42) in Phase 2
The filetype classifier is biased toward source code and would mislead
the survey pass on non-code targets (mail, notebooks, ledgers). #5
ships with a prompt-level Band-Aid; #42 captures the real fix and is
sequenced after the survey pass is observable end-to-end and before
Phase 3 depends on survey output.
2026-04-06 21:47:49 -06:00
Jeff Smith
2afef76a67 merge: feat/issue-4-survey-prompt (#4) 2026-04-06 21:35:29 -06:00
Jeff Smith
987f41ec2e feat(prompts): add _SURVEY_SYSTEM_PROMPT for survey pass (#4)
Adds the system prompt for the survey reconnaissance pass. The survey
agent answers three questions (what is this, what approach, which tools
matter) from cheap signals — file type distribution and a top-2-level
tree — without reading files. Tool triage is tri-state: relevant, skip,
or unlisted (default), so skip is reserved for tools whose use would be
actively wrong rather than merely unnecessary.

Wiring of _run_survey() and the submit_survey tool follows in #5.
2026-04-06 21:35:17 -06:00
Jeff Smith
0a9afc96c9 chore: update CLAUDE.md for session 3 2026-04-06 21:15:27 -06:00
Jeff Smith
09e5686bea merge: feat/issue-3-low-confidence-entries (#3) 2026-04-06 21:13:58 -06:00
Jeff Smith
1d681c8bc1 feat(cache): add low_confidence_entries() query to CacheManager (#3)
Returns all file and dir cache entries with confidence below a given
threshold (default 0.7). Entries missing a confidence field are
included as unrated/untrusted. Results sorted ascending by confidence
so least-confident entries come first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 21:13:58 -06:00
Jeff Smith
a67e4789b2 merge: feat/issue-2-confidence-prompt (#2) 2026-04-06 20:46:08 -06:00
Jeff Smith
80f8f883c1 feat(prompts): instruct agent to set confidence on cache writes (#2)
Add confidence and confidence_reason to both cache schemas in the dir
loop prompt. Add a Confidence section with categorical guidance
(high ≥ 0.8, medium 0.5–0.8, low < 0.5) and the rule to include
confidence_reason when confidence is below 0.7.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 20:46:05 -06:00
Jeff Smith
4338587360 merge: feat/issue-37-unit-tests (#37) 2026-04-06 16:57:30 -06:00
Jeff Smith
6875cf5ed1 feat(tests): add unit test coverage for all testable modules (#37)
129 tests across cache, filetypes, code, disk, recency, tree, report,
and capabilities. Uses stdlib unittest only — no new dependencies.
Also updates CLAUDE.md development workflow to require test coverage
for all future changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 16:57:26 -06:00
Jeff Smith
0d9812490a merge: feat/issue-1-confidence-fields (#1) 2026-04-06 16:51:58 -06:00
Jeff Smith
b158809c19 feat(cache): add confidence fields to file and dir cache schemas (#1)
Add optional confidence (float 0.0–1.0) and confidence_reason (str) fields
to both file and dir cache entries. Validation rejects out-of-range values
and wrong types. Fields are not yet required — pure schema instrumentation
for Phase 1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 16:50:48 -06:00
Jeff Smith
c7bacb674f chore: update CLAUDE.md for session 2 2026-04-06 16:43:25 -06:00
Jeff Smith
309e3464bc chore: add dev workflow, branching discipline, ADHD protocols, session docs 2026-04-06 16:20:51 -06:00
Jeff Smith
a2b015a965 chore: ignore docs/wiki/ — separate git repo 2026-04-06 16:13:31 -06:00
Jeff Smith
77d4be0221 chore: rewrite CLAUDE.md — thin format, wiki references, dev practices 2026-04-06 16:13:20 -06:00
Jeff Smith
d323190866 merge: add -x/--exclude flag for directory exclusion 2026-04-06 14:32:17 -06:00
Jeff Smith
78f9a396dd feat: add -x/--exclude flag to exclude directories from scan and AI analysis 2026-04-06 14:32:12 -06:00
Jeff Smith
78f80c31ed merge: in-place per-file progress for scan steps 2026-04-06 14:26:40 -06:00
Jeff Smith
206d2d34f6 feat: in-place per-file progress for classify, count, and large-file steps 2026-04-06 14:26:37 -06:00
Jeff Smith
bbaf387cb7 merge: add progress output to base scan steps 2026-04-06 14:21:19 -06:00
Jeff Smith
ebc6b852f1 feat: add progress output to base scan steps 2026-04-06 14:21:17 -06:00
Jeff Smith
33df555a8c merge: extract system prompts module 2026-03-30 14:44:57 -06:00
Jeff Smith
ea8c07a692 refactor: extract system prompts into luminos_lib/prompts.py
Moves _DIR_SYSTEM_PROMPT and _SYNTHESIS_SYSTEM_PROMPT from ai.py into
a dedicated prompts module. Both are pure template strings with .format()
placeholders — no runtime imports needed in prompts.py. Prompt content
is byte-for-byte identical to the original.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:44:45 -06:00
Jeff Smith
5c6124a715 merge: extract AST parser module 2026-03-30 14:34:06 -06:00
Jeff Smith
0c49da23ab refactor: extract AST parsing into luminos_lib/ast_parser.py
Moves all tree-sitter parsing logic from ai.py into a dedicated module.
Replaces the if/elif language chain with a _LANGUAGE_HANDLERS registry
mapping language names to handler functions.

Extracted: _tool_parse_structure body, _get_ts_parser, _child_by_type,
_text, and all per-language helpers (_py_func_sig, _py_class, etc.).
ai.py retains a thin wrapper for path validation.

Public API: parse_structure(path) -> JSON string

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:34:02 -06:00
Jeff Smith
8aa6c713db merge: post-cache-extraction cleanup 2026-03-30 13:52:43 -06:00
Jeff Smith
dceff144b6 chore: remove dead clear_cache from ai.py, deduplicate CACHE_ROOT
- Delete unused clear_cache() from ai.py (luminos.py imports it from
  capabilities.py)
- Remove CACHE_ROOT import from ai.py (was only used by dead function)
- Replace local CACHE_ROOT constant in capabilities.py with import
  from cache.py (single source of truth)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 13:52:38 -06:00
Jeff Smith
811fe3514c merge: extract cache management module 2026-03-30 13:12:42 -06:00
Jeff Smith
bbd04f41a7 refactor: extract cache management into luminos_lib/cache.py
Moves investigation ID persistence and _CacheManager class from ai.py
into a dedicated cache module. No behavior changes.

Moved: _load_investigations, _save_investigations, _get_investigation_id,
_CacheManager (all methods), _sha256_path, CACHE_ROOT, INVESTIGATIONS_PATH.

Also added a local _now_iso() in cache.py to avoid a circular import
(ai.py imports from cache.py).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 13:12:37 -06:00
Jeff Smith
a7546fa1e8 merge: chain-of-thought observability tools 2026-03-30 13:02:25 -06:00
Jeff Smith
f324648c10 feat: add chain-of-thought observability tools
Adds think, checkpoint, and flag tools for agent reasoning visibility:
- think: records observation/hypothesis/next_action before investigation
- checkpoint: summarizes learned/unknown/next_phase after file clusters
- flag: marks notable findings to flags.jsonl with severity levels

Additional changes:
- Step numbering in investigation system prompt
- Text blocks from agent now printed to stderr (step labels visible)
- flag tool available in both investigation and synthesis passes
- analyze_directory() returns (brief, detailed, flags) three-tuple
- format_flags() in report.py renders flags sorted by severity
- Per-directory max_turns increased from 10 to 14

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 13:02:19 -06:00
Jeff Smith
dd58a4fd3a merge: fix token budget early exit cache flush 2026-03-30 12:17:33 -06:00
Jeff Smith
2e2c64386f fix: flush partial directory cache on context budget early exit
When the 70% context budget is hit mid-directory, the early exit now
writes a partial directory cache entry from whatever file summaries
the agent cached in prior turns, instead of discarding the work.

If file entries exist: concatenates their summaries into a directory
entry marked partial=true. If no files were cached: writes a minimal
entry noting the budget was reached before processing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 12:17:28 -06:00
Jeff Smith
faccf28dd8 chore: document git workflow conventions in CLAUDE.md
Adds branch naming, commit message format, and merge procedure.
All future changes must start on a branch and merge to main with --no-ff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 12:14:34 -06:00
Jeff Smith
da387289f3 chore: add venv setup script and update CLAUDE.md for optional deps
- setup_env.sh creates ~/luminos-env venv and installs all AI packages
- CLAUDE.md updated to reflect the new dependency model: base tool is
  zero-dep, --ai requires packages installed via venv
- Documents the capabilities module and updated ai.py architecture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 12:14:13 -06:00
Jeff Smith
0412a8c0cb feat: add --fresh, --clear-cache, and --install-extras CLI flags
- --install-extras: prints status of all optional AI packages
- --clear-cache: wipes /tmp/luminos/ investigation cache
- --fresh: forces a new investigation ID, ignoring cached results
- AI import is now lazy (only when --ai is used) so the base tool
  never touches optional dependencies
- target argument is optional when using --install-extras

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 12:14:05 -06:00
Jeff Smith
907dcf0a37 refactor: replace single-shot API with multi-pass agentic investigation
Rewrites ai.py from a single Claude API call into a multi-pass,
cache-driven agent architecture:

- Per-directory isolated agent loops (max 10 turns each) with context
  discarded between directories
- Leaves-first processing order so child summaries inform parents
- Disk cache (/tmp/luminos/{uuid}/) persists across runs for resumability
- Investigation ID persistence keyed by target realpath
- Separate synthesis pass reads only directory-level cache entries
- Replaces urllib with Anthropic SDK (streaming, automatic retries)
- Token counting with 70% context budget threshold for early exit
- parse_structure tool via tree-sitter (Python, JS, Rust, Go)
- python-magic integration for MIME-aware directory listings
- Cost tracking printed at end of investigation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 12:13:55 -06:00
Jeff Smith
2ba07f34a2 feat: add capabilities detection module for optional AI dependencies
Introduces luminos_lib/capabilities.py as the single source of truth for
optional package availability. Detects anthropic, tree-sitter, python-magic
and their grammar packages. Provides check_ai_dependencies() for gating
--ai mode and print_status() for --install-extras. Also hosts clear_cache()
to avoid pulling heavy AI imports for cache cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 12:13:40 -06:00
Jeff Smith
bcf7d12b4a feat: add AI-powered directory analysis via Claude API
Adds --ai flag that sends the directory tree, file categories, and
sampled file contents to Claude for analysis. Produces a brief
summary at the top of the report and a detailed breakdown at the
end. Requires ANTHROPIC_API_KEY env var; degrades gracefully without it.
Uses only stdlib (urllib) to keep the zero-dependency constraint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 10:03:48 -06:00
Jeff Smith
d6f36ecea5 feat: add --watch mode with change diffing
Re-scans every 30 seconds and shows new files, deleted files, and
size changes between scans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 09:57:39 -06:00
Jeff Smith
2c0a9cf872 feat: add formatted report output with --json and --output flags
Human-readable terminal report with clear sections, plus JSON output
mode and file output support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 09:57:35 -06:00