Commit graph

5 commits

Author SHA1 Message Date
Jeff Smith
f3abbce7d4 feat(filetypes): expose raw signals to survey, remove classifier bias (#42)
The survey pass no longer receives the bucketed file_categories
histogram, which was biased toward source-code targets and would
mislabel mail, notebooks, ledgers, and other non-code domains as
"source" via the file --brief "text" pattern fallback.

Adds filetypes.survey_signals(), which assembles raw signals from
the same `classified` data the bucketer already processes — no new
walks, no new dependencies:
  total_files       — total count
  extension_histogram — top 20 extensions, raw, no taxonomy
  file_descriptions   — top 20 `file --brief` outputs, by count
  filename_samples    — 20 names, evenly drawn (not first-20)

`survey --brief` descriptions are truncated at 80 chars before
counting so prefixes group correctly without exploding key cardinality.

The Band-Aid in _SURVEY_SYSTEM_PROMPT (warning the LLM that the
histogram was biased toward source code) is removed and replaced
with neutral guidance on how to read the raw signals together.
The {file_type_distribution} placeholder is renamed to
{survey_signals} to reflect the broader content.

luminos.py base scan computes survey_signals once and stores it on
report["survey_signals"]; AI consumers read from there.

summarize_categories() and report["file_categories"] are unchanged
— the terminal report still uses the bucketed view (#49 tracks
fixing that follow-up).

Smoke tested on two targets:
- luminos_lib: identical-quality survey ("Python library package",
  confidence 0.85), unchanged behavior on code targets.
- A synthetic Maildir of 8 messages with `:2,S` flag suffixes:
  survey now correctly identifies it as "A Maildir-format mailbox
  containing 8 email messages" with confidence 0.90, names the
  Maildir naming convention in domain_notes, and correctly marks
  parse_structure as a skip tool. Before #42 this would have been
  "8 source files."

Adds 8 unit tests for survey_signals covering empty input, extension
histogram, description aggregation/truncation, top-N cap, and
even-stride filename sampling.

#48 tracks the unit-of-analysis limitation (file is the wrong unit
for mbox, SQLite, archives, notebooks) — explicitly out of scope
for #42 and documented in survey_signals' docstring.
2026-04-06 22:36:14 -06:00
Jeff Smith
8fb2f90678 feat(ai): skip survey pass for tiny targets (#7)
Adds a gate in _run_investigation that skips the survey API call when
a target has both fewer than _SURVEY_MIN_FILES (5) files AND fewer
than _SURVEY_MIN_DIRS (2) directories. AND semantics handle the
deep-narrow edge case correctly: a target with 4 files spread across
50 directories still gets a survey because dir count amortizes the
cost across 50 dir loops.

When skipped, _default_survey() supplies a synthetic dict with
confidence=0.0 — chosen specifically so _filter_dir_tools() never
enforces skip_tools from a synthetic value. The dir loop receives
a generic "small target, read everything" framing in its prompt and
keeps its full toolbox.

Reorders _discover_directories() to run before the survey gate so
total_dirs is available without a second walk.

#46 tracks revisiting the threshold values with empirical data after
Phase 2 ships and we've run --ai on a variety of real targets.

Smoke tested on a 2-file target: gate triggers, default survey
substituted, dir loop completes normally. Adds 4 unit tests for
_default_survey() covering schema, confidence guard, filter
interaction, and empty skip_tools.
2026-04-06 22:19:25 -06:00
Jeff Smith
2e3d21f774 feat(ai): wire survey output into dir loop (#6)
The survey pass now actually steers dir loop behavior, in two ways:

1. Prompt injection: a new {survey_context} placeholder in
   _DIR_SYSTEM_PROMPT receives the survey description, approach,
   domain_notes, relevant_tools, and skip_tools so the dir-loop agent
   has investigation context before its first turn.

2. Tool schema filtering: _filter_dir_tools() removes any tool listed
   in skip_tools from the schema passed to the API, gated on
   survey confidence >= 0.5. Control-flow tools (submit_report) are
   always preserved. This is hard enforcement — the agent literally
   cannot call a filtered tool, which the smoke test for #5 showed
   was necessary (prompt-only guidance was ignored).

Smoke test on luminos_lib: zero run_command invocations (vs 2 before),
context budget no longer exhausted (87k vs 133k), cost ~$0.34 (vs
$0.46), investigation completes instead of early-exiting.

Adds tests/test_ai_filter.py with 14 tests covering _filter_dir_tools
and _format_survey_block — both pure helpers, no live API needed.
2026-04-06 22:07:12 -06:00
Jeff Smith
1d681c8bc1 feat(cache): add low_confidence_entries() query to CacheManager (#3)
Returns all file and dir cache entries with confidence below a given
threshold (default 0.7). Entries missing a confidence field are
included as unrated/untrusted. Results sorted ascending by confidence
so least-confident entries come first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 21:13:58 -06:00
Jeff Smith
6875cf5ed1 feat(tests): add unit test coverage for all testable modules (#37)
129 tests across cache, filetypes, code, disk, recency, tree, report,
and capabilities. Uses stdlib unittest only — no new dependencies.
Also updates CLAUDE.md development workflow to require test coverage
for all future changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 16:57:26 -06:00