luminos

Author	SHA1	Message	Date
Jeff Smith	f3abbce7d4	feat(filetypes): expose raw signals to survey, remove classifier bias (#42 ) The survey pass no longer receives the bucketed file_categories histogram, which was biased toward source-code targets and would mislabel mail, notebooks, ledgers, and other non-code domains as "source" via the file --brief "text" pattern fallback. Adds filetypes.survey_signals(), which assembles raw signals from the same `classified` data the bucketer already processes — no new walks, no new dependencies: total_files — total count extension_histogram — top 20 extensions, raw, no taxonomy file_descriptions — top 20 `file --brief` outputs, by count filename_samples — 20 names, evenly drawn (not first-20) `survey --brief` descriptions are truncated at 80 chars before counting so prefixes group correctly without exploding key cardinality. The Band-Aid in _SURVEY_SYSTEM_PROMPT (warning the LLM that the histogram was biased toward source code) is removed and replaced with neutral guidance on how to read the raw signals together. The {file_type_distribution} placeholder is renamed to {survey_signals} to reflect the broader content. luminos.py base scan computes survey_signals once and stores it on report["survey_signals"]; AI consumers read from there. summarize_categories() and report["file_categories"] are unchanged — the terminal report still uses the bucketed view (#49 tracks fixing that follow-up). Smoke tested on two targets: - luminos_lib: identical-quality survey ("Python library package", confidence 0.85), unchanged behavior on code targets. - A synthetic Maildir of 8 messages with `:2,S` flag suffixes: survey now correctly identifies it as "A Maildir-format mailbox containing 8 email messages" with confidence 0.90, names the Maildir naming convention in domain_notes, and correctly marks parse_structure as a skip tool. Before #42 this would have been "8 source files." Adds 8 unit tests for survey_signals covering empty input, extension histogram, description aggregation/truncation, top-N cap, and even-stride filename sampling. #48 tracks the unit-of-analysis limitation (file is the wrong unit for mbox, SQLite, archives, notebooks) — explicitly out of scope for #42 and documented in survey_signals' docstring.	2026-04-06 22:36:14 -06:00
Jeff Smith	78f9a396dd	feat: add -x/--exclude flag to exclude directories from scan and AI analysis	2026-04-06 14:32:12 -06:00
Jeff Smith	206d2d34f6	feat: in-place per-file progress for classify, count, and large-file steps	2026-04-06 14:26:37 -06:00
Jeff Smith	ebc6b852f1	feat: add progress output to base scan steps	2026-04-06 14:21:17 -06:00
Jeff Smith	f324648c10	feat: add chain-of-thought observability tools Adds think, checkpoint, and flag tools for agent reasoning visibility: - think: records observation/hypothesis/next_action before investigation - checkpoint: summarizes learned/unknown/next_phase after file clusters - flag: marks notable findings to flags.jsonl with severity levels Additional changes: - Step numbering in investigation system prompt - Text blocks from agent now printed to stderr (step labels visible) - flag tool available in both investigation and synthesis passes - analyze_directory() returns (brief, detailed, flags) three-tuple - format_flags() in report.py renders flags sorted by severity - Per-directory max_turns increased from 10 to 14 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 13:02:19 -06:00
Jeff Smith	0412a8c0cb	feat: add --fresh, --clear-cache, and --install-extras CLI flags - --install-extras: prints status of all optional AI packages - --clear-cache: wipes /tmp/luminos/ investigation cache - --fresh: forces a new investigation ID, ignoring cached results - AI import is now lazy (only when --ai is used) so the base tool never touches optional dependencies - target argument is optional when using --install-extras Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 12:14:05 -06:00
Jeff Smith	bcf7d12b4a	feat: add AI-powered directory analysis via Claude API Adds --ai flag that sends the directory tree, file categories, and sampled file contents to Claude for analysis. Produces a brief summary at the top of the report and a detailed breakdown at the end. Requires ANTHROPIC_API_KEY env var; degrades gracefully without it. Uses only stdlib (urllib) to keep the zero-dependency constraint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 10:03:48 -06:00
Jeff Smith	461bdc404e	chore: initial project scaffold Set up Python project structure with entry point and library package. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 09:57:11 -06:00

8 commits