luminos/luminos.py

163 lines
5.8 KiB
Python
Raw Normal View History

#!/usr/bin/env python3
"""Luminos — file system intelligence tool."""
import argparse
import json
import os
import shutil
import sys
from luminos_lib.tree import build_tree, render_tree
feat(filetypes): expose raw signals to survey, remove classifier bias (#42) The survey pass no longer receives the bucketed file_categories histogram, which was biased toward source-code targets and would mislabel mail, notebooks, ledgers, and other non-code domains as "source" via the file --brief "text" pattern fallback. Adds filetypes.survey_signals(), which assembles raw signals from the same `classified` data the bucketer already processes — no new walks, no new dependencies: total_files — total count extension_histogram — top 20 extensions, raw, no taxonomy file_descriptions — top 20 `file --brief` outputs, by count filename_samples — 20 names, evenly drawn (not first-20) `survey --brief` descriptions are truncated at 80 chars before counting so prefixes group correctly without exploding key cardinality. The Band-Aid in _SURVEY_SYSTEM_PROMPT (warning the LLM that the histogram was biased toward source code) is removed and replaced with neutral guidance on how to read the raw signals together. The {file_type_distribution} placeholder is renamed to {survey_signals} to reflect the broader content. luminos.py base scan computes survey_signals once and stores it on report["survey_signals"]; AI consumers read from there. summarize_categories() and report["file_categories"] are unchanged — the terminal report still uses the bucketed view (#49 tracks fixing that follow-up). Smoke tested on two targets: - luminos_lib: identical-quality survey ("Python library package", confidence 0.85), unchanged behavior on code targets. - A synthetic Maildir of 8 messages with `:2,S` flag suffixes: survey now correctly identifies it as "A Maildir-format mailbox containing 8 email messages" with confidence 0.90, names the Maildir naming convention in domain_notes, and correctly marks parse_structure as a skip tool. Before #42 this would have been "8 source files." Adds 8 unit tests for survey_signals covering empty input, extension histogram, description aggregation/truncation, top-N cap, and even-stride filename sampling. #48 tracks the unit-of-analysis limitation (file is the wrong unit for mbox, SQLite, archives, notebooks) — explicitly out of scope for #42 and documented in survey_signals' docstring.
2026-04-06 22:36:14 -06:00
from luminos_lib.filetypes import (
classify_files,
summarize_categories,
survey_signals,
)
from luminos_lib.code import detect_languages, find_large_files
from luminos_lib.recency import find_recent_files
from luminos_lib.disk import get_disk_usage, top_directories
from luminos_lib.report import format_report
def _progress(label):
feat: AI investigation is the product, drop zero-dep constraint (#64) Two original design constraints are dropped: 1. Zero-dependency Python CLI is no longer a goal. Luminos installs from requirements.txt like a normal Python project. 2. AI investigation is the headline. The base scan becomes the agent's first input pass, not a standalone product. There is no --ai flag and no --no-ai mode. AI runs unconditionally on every invocation. Watch mode is deleted as part of the same change because a non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode is wanted later, it gets rebuilt as incremental AI re-investigation. Code: - Delete luminos_lib/watch.py - Delete luminos_lib/capabilities.py and tests/test_capabilities.py - Move clear_cache() into luminos_lib/cache.py - luminos.py: remove --watch, --ai, --install-extras flags. AI runs unconditionally after the base scan. If ANTHROPIC_API_KEY is unset, exit 0 with a one-line hint before running the base scan. - ai.py: drop the check_ai_dependencies() call and import. - New requirements.txt: anthropic, tree-sitter + grammars, python-magic. - setup_env.sh installs from requirements.txt. Docs: - README.md rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line. - CLAUDE.md (project): rewrites Key Constraints, updates module map and Running Luminos commands. - PLAN.md: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature. Tests: 164 pass (down from 168 with the 4 removed capabilities tests).
2026-04-11 09:43:47 -06:00
"""Return (on_file, finish) for in-place per-file progress on stderr."""
cols = shutil.get_terminal_size((80, 20)).columns
prefix = f" [scan] {label}... "
available = max(cols - len(prefix), 10)
def on_file(path):
rel = os.path.relpath(path)
if len(rel) > available:
rel = "..." + rel[-(available - 3):]
print(f"\r{prefix}{rel}\033[K", end="", file=sys.stderr, flush=True)
def finish():
print(f"\r{prefix}done\033[K", file=sys.stderr, flush=True)
return on_file, finish
def scan(target, depth=3, show_hidden=False, exclude=None):
feat: AI investigation is the product, drop zero-dep constraint (#64) Two original design constraints are dropped: 1. Zero-dependency Python CLI is no longer a goal. Luminos installs from requirements.txt like a normal Python project. 2. AI investigation is the headline. The base scan becomes the agent's first input pass, not a standalone product. There is no --ai flag and no --no-ai mode. AI runs unconditionally on every invocation. Watch mode is deleted as part of the same change because a non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode is wanted later, it gets rebuilt as incremental AI re-investigation. Code: - Delete luminos_lib/watch.py - Delete luminos_lib/capabilities.py and tests/test_capabilities.py - Move clear_cache() into luminos_lib/cache.py - luminos.py: remove --watch, --ai, --install-extras flags. AI runs unconditionally after the base scan. If ANTHROPIC_API_KEY is unset, exit 0 with a one-line hint before running the base scan. - ai.py: drop the check_ai_dependencies() call and import. - New requirements.txt: anthropic, tree-sitter + grammars, python-magic. - setup_env.sh installs from requirements.txt. Docs: - README.md rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line. - CLAUDE.md (project): rewrites Key Constraints, updates module map and Running Luminos commands. - PLAN.md: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature. Tests: 164 pass (down from 168 with the 4 removed capabilities tests).
2026-04-11 09:43:47 -06:00
"""Run the base scan and return the report dict consumed by the AI pass."""
report = {}
exclude = exclude or []
print(f" [scan] Building directory tree (depth={depth})...", file=sys.stderr)
tree = build_tree(target, max_depth=depth, show_hidden=show_hidden,
exclude=exclude)
report["tree"] = tree
report["tree_rendered"] = render_tree(tree)
on_file, finish = _progress("Classifying files")
classified = classify_files(target, show_hidden=show_hidden,
exclude=exclude, on_file=on_file)
finish()
report["file_categories"] = summarize_categories(classified)
report["classified_files"] = classified
feat(filetypes): expose raw signals to survey, remove classifier bias (#42) The survey pass no longer receives the bucketed file_categories histogram, which was biased toward source-code targets and would mislabel mail, notebooks, ledgers, and other non-code domains as "source" via the file --brief "text" pattern fallback. Adds filetypes.survey_signals(), which assembles raw signals from the same `classified` data the bucketer already processes — no new walks, no new dependencies: total_files — total count extension_histogram — top 20 extensions, raw, no taxonomy file_descriptions — top 20 `file --brief` outputs, by count filename_samples — 20 names, evenly drawn (not first-20) `survey --brief` descriptions are truncated at 80 chars before counting so prefixes group correctly without exploding key cardinality. The Band-Aid in _SURVEY_SYSTEM_PROMPT (warning the LLM that the histogram was biased toward source code) is removed and replaced with neutral guidance on how to read the raw signals together. The {file_type_distribution} placeholder is renamed to {survey_signals} to reflect the broader content. luminos.py base scan computes survey_signals once and stores it on report["survey_signals"]; AI consumers read from there. summarize_categories() and report["file_categories"] are unchanged — the terminal report still uses the bucketed view (#49 tracks fixing that follow-up). Smoke tested on two targets: - luminos_lib: identical-quality survey ("Python library package", confidence 0.85), unchanged behavior on code targets. - A synthetic Maildir of 8 messages with `:2,S` flag suffixes: survey now correctly identifies it as "A Maildir-format mailbox containing 8 email messages" with confidence 0.90, names the Maildir naming convention in domain_notes, and correctly marks parse_structure as a skip tool. Before #42 this would have been "8 source files." Adds 8 unit tests for survey_signals covering empty input, extension histogram, description aggregation/truncation, top-N cap, and even-stride filename sampling. #48 tracks the unit-of-analysis limitation (file is the wrong unit for mbox, SQLite, archives, notebooks) — explicitly out of scope for #42 and documented in survey_signals' docstring.
2026-04-06 22:36:14 -06:00
report["survey_signals"] = survey_signals(classified)
on_file, finish = _progress("Counting lines")
languages, loc = detect_languages(classified, on_file=on_file)
finish()
report["languages"] = languages
report["lines_of_code"] = loc
on_file, finish = _progress("Checking for large files")
report["large_files"] = find_large_files(classified, on_file=on_file)
finish()
print(" [scan] Finding recently modified files...", file=sys.stderr)
report["recent_files"] = find_recent_files(target, show_hidden=show_hidden,
exclude=exclude)
print(" [scan] Calculating disk usage...", file=sys.stderr)
usage = get_disk_usage(target, show_hidden=show_hidden, exclude=exclude)
report["disk_usage"] = usage
report["top_directories"] = top_directories(usage, n=5)
print(" [scan] Base scan complete.", file=sys.stderr)
return report
def main():
parser = argparse.ArgumentParser(
prog="luminos",
description="Luminos — file system intelligence tool. "
feat: AI investigation is the product, drop zero-dep constraint (#64) Two original design constraints are dropped: 1. Zero-dependency Python CLI is no longer a goal. Luminos installs from requirements.txt like a normal Python project. 2. AI investigation is the headline. The base scan becomes the agent's first input pass, not a standalone product. There is no --ai flag and no --no-ai mode. AI runs unconditionally on every invocation. Watch mode is deleted as part of the same change because a non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode is wanted later, it gets rebuilt as incremental AI re-investigation. Code: - Delete luminos_lib/watch.py - Delete luminos_lib/capabilities.py and tests/test_capabilities.py - Move clear_cache() into luminos_lib/cache.py - luminos.py: remove --watch, --ai, --install-extras flags. AI runs unconditionally after the base scan. If ANTHROPIC_API_KEY is unset, exit 0 with a one-line hint before running the base scan. - ai.py: drop the check_ai_dependencies() call and import. - New requirements.txt: anthropic, tree-sitter + grammars, python-magic. - setup_env.sh installs from requirements.txt. Docs: - README.md rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line. - CLAUDE.md (project): rewrites Key Constraints, updates module map and Running Luminos commands. - PLAN.md: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature. Tests: 164 pass (down from 168 with the 4 removed capabilities tests).
2026-04-11 09:43:47 -06:00
"Runs an agentic Claude investigation against a directory "
"and produces a reconnaissance report.",
)
parser.add_argument("target", nargs="?", help="Target directory to analyze")
parser.add_argument("-d", "--depth", type=int, default=3,
help="Maximum tree depth (default: 3)")
parser.add_argument("-a", "--all", action="store_true",
help="Include hidden files and directories")
parser.add_argument("--json", action="store_true", dest="json_output",
help="Output report as JSON")
parser.add_argument("-o", "--output", metavar="FILE",
help="Write report to a file")
parser.add_argument("--clear-cache", action="store_true",
feat: AI investigation is the product, drop zero-dep constraint (#64) Two original design constraints are dropped: 1. Zero-dependency Python CLI is no longer a goal. Luminos installs from requirements.txt like a normal Python project. 2. AI investigation is the headline. The base scan becomes the agent's first input pass, not a standalone product. There is no --ai flag and no --no-ai mode. AI runs unconditionally on every invocation. Watch mode is deleted as part of the same change because a non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode is wanted later, it gets rebuilt as incremental AI re-investigation. Code: - Delete luminos_lib/watch.py - Delete luminos_lib/capabilities.py and tests/test_capabilities.py - Move clear_cache() into luminos_lib/cache.py - luminos.py: remove --watch, --ai, --install-extras flags. AI runs unconditionally after the base scan. If ANTHROPIC_API_KEY is unset, exit 0 with a one-line hint before running the base scan. - ai.py: drop the check_ai_dependencies() call and import. - New requirements.txt: anthropic, tree-sitter + grammars, python-magic. - setup_env.sh installs from requirements.txt. Docs: - README.md rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line. - CLAUDE.md (project): rewrites Key Constraints, updates module map and Running Luminos commands. - PLAN.md: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature. Tests: 164 pass (down from 168 with the 4 removed capabilities tests).
2026-04-11 09:43:47 -06:00
help="Clear the investigation cache (/tmp/luminos/)")
parser.add_argument("--fresh", action="store_true",
feat: AI investigation is the product, drop zero-dep constraint (#64) Two original design constraints are dropped: 1. Zero-dependency Python CLI is no longer a goal. Luminos installs from requirements.txt like a normal Python project. 2. AI investigation is the headline. The base scan becomes the agent's first input pass, not a standalone product. There is no --ai flag and no --no-ai mode. AI runs unconditionally on every invocation. Watch mode is deleted as part of the same change because a non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode is wanted later, it gets rebuilt as incremental AI re-investigation. Code: - Delete luminos_lib/watch.py - Delete luminos_lib/capabilities.py and tests/test_capabilities.py - Move clear_cache() into luminos_lib/cache.py - luminos.py: remove --watch, --ai, --install-extras flags. AI runs unconditionally after the base scan. If ANTHROPIC_API_KEY is unset, exit 0 with a one-line hint before running the base scan. - ai.py: drop the check_ai_dependencies() call and import. - New requirements.txt: anthropic, tree-sitter + grammars, python-magic. - setup_env.sh installs from requirements.txt. Docs: - README.md rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line. - CLAUDE.md (project): rewrites Key Constraints, updates module map and Running Luminos commands. - PLAN.md: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature. Tests: 164 pass (down from 168 with the 4 removed capabilities tests).
2026-04-11 09:43:47 -06:00
help="Force a new investigation (ignore cached results)")
parser.add_argument("-x", "--exclude", metavar="DIR", action="append",
default=[],
help="Exclude a directory name from scan and analysis "
"(repeatable, e.g. -x .git -x node_modules)")
args = parser.parse_args()
if args.clear_cache:
feat: AI investigation is the product, drop zero-dep constraint (#64) Two original design constraints are dropped: 1. Zero-dependency Python CLI is no longer a goal. Luminos installs from requirements.txt like a normal Python project. 2. AI investigation is the headline. The base scan becomes the agent's first input pass, not a standalone product. There is no --ai flag and no --no-ai mode. AI runs unconditionally on every invocation. Watch mode is deleted as part of the same change because a non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode is wanted later, it gets rebuilt as incremental AI re-investigation. Code: - Delete luminos_lib/watch.py - Delete luminos_lib/capabilities.py and tests/test_capabilities.py - Move clear_cache() into luminos_lib/cache.py - luminos.py: remove --watch, --ai, --install-extras flags. AI runs unconditionally after the base scan. If ANTHROPIC_API_KEY is unset, exit 0 with a one-line hint before running the base scan. - ai.py: drop the check_ai_dependencies() call and import. - New requirements.txt: anthropic, tree-sitter + grammars, python-magic. - setup_env.sh installs from requirements.txt. Docs: - README.md rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line. - CLAUDE.md (project): rewrites Key Constraints, updates module map and Running Luminos commands. - PLAN.md: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature. Tests: 164 pass (down from 168 with the 4 removed capabilities tests).
2026-04-11 09:43:47 -06:00
from luminos_lib.cache import clear_cache
clear_cache()
if not args.target:
return
if not args.target:
parser.error("the following arguments are required: target")
target = os.path.abspath(args.target)
if not os.path.isdir(target):
print(f"Error: '{args.target}' is not a directory or does not exist.",
file=sys.stderr)
sys.exit(1)
feat: AI investigation is the product, drop zero-dep constraint (#64) Two original design constraints are dropped: 1. Zero-dependency Python CLI is no longer a goal. Luminos installs from requirements.txt like a normal Python project. 2. AI investigation is the headline. The base scan becomes the agent's first input pass, not a standalone product. There is no --ai flag and no --no-ai mode. AI runs unconditionally on every invocation. Watch mode is deleted as part of the same change because a non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode is wanted later, it gets rebuilt as incremental AI re-investigation. Code: - Delete luminos_lib/watch.py - Delete luminos_lib/capabilities.py and tests/test_capabilities.py - Move clear_cache() into luminos_lib/cache.py - luminos.py: remove --watch, --ai, --install-extras flags. AI runs unconditionally after the base scan. If ANTHROPIC_API_KEY is unset, exit 0 with a one-line hint before running the base scan. - ai.py: drop the check_ai_dependencies() call and import. - New requirements.txt: anthropic, tree-sitter + grammars, python-magic. - setup_env.sh installs from requirements.txt. Docs: - README.md rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line. - CLAUDE.md (project): rewrites Key Constraints, updates module map and Running Luminos commands. - PLAN.md: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature. Tests: 164 pass (down from 168 with the 4 removed capabilities tests).
2026-04-11 09:43:47 -06:00
if not os.environ.get("ANTHROPIC_API_KEY"):
print("luminos requires ANTHROPIC_API_KEY. "
"Set it with: export ANTHROPIC_API_KEY=your-key-here",
file=sys.stderr)
sys.exit(0)
if args.exclude:
print(f" [scan] Excluding: {', '.join(args.exclude)}", file=sys.stderr)
report = scan(target, depth=args.depth, show_hidden=args.all,
exclude=args.exclude)
feat: AI investigation is the product, drop zero-dep constraint (#64) Two original design constraints are dropped: 1. Zero-dependency Python CLI is no longer a goal. Luminos installs from requirements.txt like a normal Python project. 2. AI investigation is the headline. The base scan becomes the agent's first input pass, not a standalone product. There is no --ai flag and no --no-ai mode. AI runs unconditionally on every invocation. Watch mode is deleted as part of the same change because a non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode is wanted later, it gets rebuilt as incremental AI re-investigation. Code: - Delete luminos_lib/watch.py - Delete luminos_lib/capabilities.py and tests/test_capabilities.py - Move clear_cache() into luminos_lib/cache.py - luminos.py: remove --watch, --ai, --install-extras flags. AI runs unconditionally after the base scan. If ANTHROPIC_API_KEY is unset, exit 0 with a one-line hint before running the base scan. - ai.py: drop the check_ai_dependencies() call and import. - New requirements.txt: anthropic, tree-sitter + grammars, python-magic. - setup_env.sh installs from requirements.txt. Docs: - README.md rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line. - CLAUDE.md (project): rewrites Key Constraints, updates module map and Running Luminos commands. - PLAN.md: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature. Tests: 164 pass (down from 168 with the 4 removed capabilities tests).
2026-04-11 09:43:47 -06:00
from luminos_lib.ai import analyze_directory
brief, detailed, flags = analyze_directory(
report, target, fresh=args.fresh, exclude=args.exclude)
report["ai_brief"] = brief
report["ai_detailed"] = detailed
report["flags"] = flags
if args.json_output:
output = json.dumps(report, indent=2, default=str)
else:
output = format_report(report, target, flags=flags)
if args.output:
try:
with open(args.output, "w") as f:
f.write(output + "\n")
print(f"Report written to {args.output}")
except OSError as e:
print(f"Error writing to '{args.output}': {e}", file=sys.stderr)
sys.exit(1)
else:
print(output)
if __name__ == "__main__":
main()