luminos

Author	SHA1	Message	Date
Jeff Smith	79bb10b9dc	fix(ai): match target root dir by basename in _apply_plan() (#76 ) The planner sees basename(target) in the tree output (e.g. "luminos_lib") and uses that as the path in its plan. But _apply_plan() mapped the target root to "." via os.path.relpath(), so the planner's path never matched and the allocation was silently dropped. Fix: register both "." and basename(target) as aliases for the target root in the lookup table. Also log a warning when plan paths don't match any known directory, so future mismatches are visible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 20:38:55 -06:00
Jeff Smith	2adbed9d28	feat(ai): implement Phase 3 investigation planning (#8 , #9 , #10 , #11 , #74 ) Add a planning pass that runs after survey and before dir loops. The planner classifies directories into priority/shallow/skip tiers and allocates turns accordingly, replacing the fixed max_turns=14 per directory with dynamic allocation from a global budget. Planning pass: - _PLANNING_SYSTEM_PROMPT in prompts.py with submit_plan tool - _run_planning() follows the same single-turn pattern as _run_survey() - submit_plan tool registered in new "planning" scope - _apply_plan() pure function: band-sorted ordering (leaf-first within bands), turn map, skip-dir removal - _default_plan() fallback when planning is skipped or fails - Plan cached as plan.json for resumed runs Dynamic turn allocation: - Priority dirs: 15-20 turns (capped at 25) - Shallow dirs: 5 turns - Default: 10 turns - Skip dirs: excluded entirely - Orchestrator passes per-dir max_turns to _run_dir_loop() Quality instrumentation: - _TokenTracker._loop_turns counts API calls per dir loop - completeness field (0.0-1.0) added to dir-scope submit_report - plan_evaluation.json emitted after dir loops comparing plan predictions to actual turn utilization, completeness, and confidence - Turn utilization logged per directory during investigation Also fixes _get_child_summaries() to distinguish actual leaf directories from parents whose children have not been investigated yet, replacing the misleading "this is a leaf directory" placeholder. 26 new tests (260 total, all passing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 20:21:49 -06:00
Jeff Smith	efaa2024d7	test(ai): cover _TokenTracker, _synthesize_from_cache, _discover_directories (#70 ) Second wave of pre-Phase-3 test coverage. The #55 round picked off the easy decision-logic helpers; this round covers the three highest-impact helpers that escaped the first sweep. Three new test classes appended to tests/test_ai_pure.py: - TestTokenTracker (11 tests) Pins the load-bearing #44 fix: budget_exceeded() must use last_input (the most recent call's context size) NOT cumulative input, because each turn's input_tokens already includes the full message history. Tests assert: cumulative-input far above budget does NOT trip the gate when last_input stays small; reset_loop() preserves grand totals; the boundary is strict > not >=. - TestSynthesizeFromCache (5 tests) The synthesis fallback fires only when _run_synthesis exhausts its max_turns, which almost never happens in normal runs — exactly the kind of code that silently rots. Tests assert: empty cache returns the incomplete-message brief and empty detailed; single dir entry produces a markdown line; multi-entry detailed contains all entries; empty-summary entries are skipped; file entries alone do not satisfy (the function reads dir entries only). - TestDiscoverDirectories (9 tests) The leaves-first walk drives the entire dir-loop iteration order and is the foundation of the cache reuse story. Tests assert: empty target returns target only; nested trees come back leaves- first; .git / __pycache__ / node_modules / *.egg-info excluded; custom --exclude honored; hidden dirs excluded by default; show_ hidden=True includes them but does not override the skip list. PLAN.md: added Phase 2.7 (#56 ✅) and Phase 2.8 (#55 ✅, #70) entries to the implementation order, and removed the now-stale Phase 3.4 (#56) and Background chore (#55) sections that were displaced by the pre-Phase-3 cleanup pattern. Verification: 234 tests pass (209 prior + 25 new).	2026-04-11 10:41:16 -06:00
Jeff Smith	a6333858ee	test(ai): add unit coverage for pure helpers in ai.py (#55 ) ai.py was documented as fully exempt from unit testing because the dir loop and synthesis pass require a live Anthropic API. But several helpers in the module are pure functions with no API dependency, and they're the kind of thing that breaks silently. The #57 refactor added two more (_build_dir_loop_context, _flush_partial_dir_entry) that are also naturally testable. New tests/test_ai_pure.py — 45 tests across 8 helpers: - _should_skip_dir: exact-match, *.egg-info glob, no-match cases - _path_is_safe: inside, nested, equals, outside, traversal, sibling-with-target-prefix (the easy-to-miss security case) - _default_survey: shape, zero confidence guarantees no filtering, passes through _filter_dir_tools unchanged - _format_survey_block: None, empty, minimal, with relevant_tools, with skip_tools, with domain_notes, empty-list omission - _filter_dir_tools: None, empty, low confidence, high confidence filters, protected tools never removed, unknown skip silently ignored, garbage/None confidence treated as zero, threshold boundary inclusive - _format_survey_signals: None, empty, zero total_files, full, partial (only extensions) - _block_to_dict: text, tool_use, unknown type - _flush_partial_dir_entry (#57): idempotent when entry exists, no-file-entries stub path, with-file-entries summary synthesis, notable_files collection Uses the same _make_manager() pattern as test_cache.py to construct a _CacheManager rooted in a tempdir, sidestepping CACHE_ROOT entirely. Doc updates: - CLAUDE.md, README.md, docs/wiki/DevelopmentGuide.md: ai.py is no longer fully exempt — only the API-dependent loops are. Pure helpers are covered by test_ai_pure.py. Verification: 209 tests pass (164 prior + 45 new).	2026-04-11 10:24:47 -06:00
Jeff Smith	c93c748ea3	feat: AI investigation is the product, drop zero-dep constraint (#64 ) Two original design constraints are dropped: 1. Zero-dependency Python CLI is no longer a goal. Luminos installs from requirements.txt like a normal Python project. 2. AI investigation is the headline. The base scan becomes the agent's first input pass, not a standalone product. There is no --ai flag and no --no-ai mode. AI runs unconditionally on every invocation. Watch mode is deleted as part of the same change because a non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode is wanted later, it gets rebuilt as incremental AI re-investigation. Code: - Delete luminos_lib/watch.py - Delete luminos_lib/capabilities.py and tests/test_capabilities.py - Move clear_cache() into luminos_lib/cache.py - luminos.py: remove --watch, --ai, --install-extras flags. AI runs unconditionally after the base scan. If ANTHROPIC_API_KEY is unset, exit 0 with a one-line hint before running the base scan. - ai.py: drop the check_ai_dependencies() call and import. - New requirements.txt: anthropic, tree-sitter + grammars, python-magic. - setup_env.sh installs from requirements.txt. Docs: - README.md rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line. - CLAUDE.md (project): rewrites Key Constraints, updates module map and Running Luminos commands. - PLAN.md: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature. Tests: 164 pass (down from 168 with the 4 removed capabilities tests).	2026-04-11 09:43:47 -06:00
Jeff Smith	036c3a934a	fix(ai): correct context budget metric — track per-call, not sum (#44 ) The dir loop was exiting early on small targets (a 13-file Python lib hit the budget at 92k–139k cumulative tokens) because _TokenTracker compared the SUM of input_tokens across all turns to the context window size. input_tokens from each API response is the size of the full prompt sent on that turn (system + every prior message + new tool results), so summing across turns multi-counts everything. The real per-call context size never approached the limit. Verified empirically: on luminos_lib pre-fix, the loop bailed when the most recent call's input_tokens was 20,535 (~10% of Sonnet's 200k window) but the cumulative sum was 134,983. Changes: - _TokenTracker now tracks last_input (the most recent call's input_tokens), separate from the cumulative loop_input/total_input used for cost reporting. - budget_exceeded() returns last_input > CONTEXT_BUDGET, not the cumulative sum. - MAX_CONTEXT bumped from 180_000 to 200_000 (Sonnet 4's real context window). CONTEXT_BUDGET stays at 70% = 140,000. - Early-exit message now shows context size, threshold, AND cumulative spend separately so future debugging is unambiguous. Smoke test on luminos_lib: investigation completes without early exit (~$0.37). 6 unit tests added covering the new semantics, including the key regression: a sequence of small calls whose sum exceeds the budget must NOT trip the check. Wiki Architecture page updated. #51 filed for the separate message-history-growth issue.	2026-04-06 22:49:25 -06:00
Jeff Smith	f3abbce7d4	feat(filetypes): expose raw signals to survey, remove classifier bias (#42 ) The survey pass no longer receives the bucketed file_categories histogram, which was biased toward source-code targets and would mislabel mail, notebooks, ledgers, and other non-code domains as "source" via the file --brief "text" pattern fallback. Adds filetypes.survey_signals(), which assembles raw signals from the same `classified` data the bucketer already processes — no new walks, no new dependencies: total_files — total count extension_histogram — top 20 extensions, raw, no taxonomy file_descriptions — top 20 `file --brief` outputs, by count filename_samples — 20 names, evenly drawn (not first-20) `survey --brief` descriptions are truncated at 80 chars before counting so prefixes group correctly without exploding key cardinality. The Band-Aid in _SURVEY_SYSTEM_PROMPT (warning the LLM that the histogram was biased toward source code) is removed and replaced with neutral guidance on how to read the raw signals together. The {file_type_distribution} placeholder is renamed to {survey_signals} to reflect the broader content. luminos.py base scan computes survey_signals once and stores it on report["survey_signals"]; AI consumers read from there. summarize_categories() and report["file_categories"] are unchanged — the terminal report still uses the bucketed view (#49 tracks fixing that follow-up). Smoke tested on two targets: - luminos_lib: identical-quality survey ("Python library package", confidence 0.85), unchanged behavior on code targets. - A synthetic Maildir of 8 messages with `:2,S` flag suffixes: survey now correctly identifies it as "A Maildir-format mailbox containing 8 email messages" with confidence 0.90, names the Maildir naming convention in domain_notes, and correctly marks parse_structure as a skip tool. Before #42 this would have been "8 source files." Adds 8 unit tests for survey_signals covering empty input, extension histogram, description aggregation/truncation, top-N cap, and even-stride filename sampling. #48 tracks the unit-of-analysis limitation (file is the wrong unit for mbox, SQLite, archives, notebooks) — explicitly out of scope for #42 and documented in survey_signals' docstring.	2026-04-06 22:36:14 -06:00
Jeff Smith	8fb2f90678	feat(ai): skip survey pass for tiny targets (#7 ) Adds a gate in _run_investigation that skips the survey API call when a target has both fewer than _SURVEY_MIN_FILES (5) files AND fewer than _SURVEY_MIN_DIRS (2) directories. AND semantics handle the deep-narrow edge case correctly: a target with 4 files spread across 50 directories still gets a survey because dir count amortizes the cost across 50 dir loops. When skipped, _default_survey() supplies a synthetic dict with confidence=0.0 — chosen specifically so _filter_dir_tools() never enforces skip_tools from a synthetic value. The dir loop receives a generic "small target, read everything" framing in its prompt and keeps its full toolbox. Reorders _discover_directories() to run before the survey gate so total_dirs is available without a second walk. #46 tracks revisiting the threshold values with empirical data after Phase 2 ships and we've run --ai on a variety of real targets. Smoke tested on a 2-file target: gate triggers, default survey substituted, dir loop completes normally. Adds 4 unit tests for _default_survey() covering schema, confidence guard, filter interaction, and empty skip_tools.	2026-04-06 22:19:25 -06:00
Jeff Smith	2e3d21f774	feat(ai): wire survey output into dir loop (#6 ) The survey pass now actually steers dir loop behavior, in two ways: 1. Prompt injection: a new {survey_context} placeholder in _DIR_SYSTEM_PROMPT receives the survey description, approach, domain_notes, relevant_tools, and skip_tools so the dir-loop agent has investigation context before its first turn. 2. Tool schema filtering: _filter_dir_tools() removes any tool listed in skip_tools from the schema passed to the API, gated on survey confidence >= 0.5. Control-flow tools (submit_report) are always preserved. This is hard enforcement — the agent literally cannot call a filtered tool, which the smoke test for #5 showed was necessary (prompt-only guidance was ignored). Smoke test on luminos_lib: zero run_command invocations (vs 2 before), context budget no longer exhausted (87k vs 133k), cost ~$0.34 (vs $0.46), investigation completes instead of early-exiting. Adds tests/test_ai_filter.py with 14 tests covering _filter_dir_tools and _format_survey_block — both pure helpers, no live API needed.	2026-04-06 22:07:12 -06:00
Jeff Smith	1d681c8bc1	feat(cache): add low_confidence_entries() query to CacheManager (#3 ) Returns all file and dir cache entries with confidence below a given threshold (default 0.7). Entries missing a confidence field are included as unrated/untrusted. Results sorted ascending by confidence so least-confident entries come first. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 21:13:58 -06:00
Jeff Smith	6875cf5ed1	feat(tests): add unit test coverage for all testable modules (#37 ) 129 tests across cache, filetypes, code, disk, recency, tree, report, and capabilities. Uses stdlib unittest only — no new dependencies. Also updates CLAUDE.md development workflow to require test coverage for all future changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 16:57:26 -06:00

11 commits