Second wave of pre-Phase-3 test coverage. The #55 round picked off the
easy decision-logic helpers; this round covers the three highest-impact
helpers that escaped the first sweep.
Three new test classes appended to tests/test_ai_pure.py:
- TestTokenTracker (11 tests)
Pins the load-bearing #44 fix: budget_exceeded() must use last_input
(the most recent call's context size) NOT cumulative input, because
each turn's input_tokens already includes the full message history.
Tests assert: cumulative-input far above budget does NOT trip the
gate when last_input stays small; reset_loop() preserves grand
totals; the boundary is strict > not >=.
- TestSynthesizeFromCache (5 tests)
The synthesis fallback fires only when _run_synthesis exhausts its
max_turns, which almost never happens in normal runs — exactly the
kind of code that silently rots. Tests assert: empty cache returns
the incomplete-message brief and empty detailed; single dir entry
produces a markdown line; multi-entry detailed contains all entries;
empty-summary entries are skipped; file entries alone do not satisfy
(the function reads dir entries only).
- TestDiscoverDirectories (9 tests)
The leaves-first walk drives the entire dir-loop iteration order
and is the foundation of the cache reuse story. Tests assert:
empty target returns target only; nested trees come back leaves-
first; .git / __pycache__ / node_modules / *.egg-info excluded;
custom --exclude honored; hidden dirs excluded by default; show_
hidden=True includes them but does not override the skip list.
PLAN.md: added Phase 2.7 (#56✅) and Phase 2.8 (#55✅, #70) entries
to the implementation order, and removed the now-stale Phase 3.4 (#56)
and Background chore (#55) sections that were displaced by the
pre-Phase-3 cleanup pattern.
Verification: 234 tests pass (209 prior + 25 new).
Two original design constraints are dropped:
1. Zero-dependency Python CLI is no longer a goal. Luminos installs from
requirements.txt like a normal Python project.
2. AI investigation is the headline. The base scan becomes the agent's
first input pass, not a standalone product. There is no --ai flag and
no --no-ai mode. AI runs unconditionally on every invocation.
Watch mode is deleted as part of the same change because a non-AI
filesystem-churn monitor conflicts with the new philosophy. If a live
update mode is wanted later, it gets rebuilt as incremental AI
re-investigation.
Code:
- Delete luminos_lib/watch.py
- Delete luminos_lib/capabilities.py and tests/test_capabilities.py
- Move clear_cache() into luminos_lib/cache.py
- luminos.py: remove --watch, --ai, --install-extras flags. AI runs
unconditionally after the base scan. If ANTHROPIC_API_KEY is unset,
exit 0 with a one-line hint before running the base scan.
- ai.py: drop the check_ai_dependencies() call and import.
- New requirements.txt: anthropic, tree-sitter + grammars, python-magic.
- setup_env.sh installs from requirements.txt.
Docs:
- README.md rewritten to lead with AI investigation, drops the two-modes
framing and the watch feature line.
- CLAUDE.md (project): rewrites Key Constraints, updates module map and
Running Luminos commands.
- PLAN.md: strips zero-dep philosophy from the file map and reframes the
watch+incremental note as a future live-mode feature.
Tests: 164 pass (down from 168 with the 4 removed capabilities tests).
#48 captures the unit-of-analysis problem: "file" is the wrong unit
for containers (mbox, SQLite, zip, notebooks) and dense directories
(Maildir, .git, node_modules). Sequenced after Phase 4 as its own
phase since it requires format detection and container handlers.
#49 captures the smaller follow-up that the terminal report still
shows the biased bucketed view. Deferred to end-of-project tuning.
#5 smoke test showed the dir loop exhausts the 126k context budget on
a 13-file Python lib. Sequencing #44 between Phase 2 and Phase 3 so
the foundation is solid before planning + external tools add more
prompt and tool weight.
The filetype classifier is biased toward source code and would mislead
the survey pass on non-code targets (mail, notebooks, ledgers). #5
ships with a prompt-level Band-Aid; #42 captures the real fix and is
sequenced after the survey pass is observable end-to-end and before
Phase 3 depends on survey output.