Revisit survey-skip thresholds with empirical data #46

New issue

Open

opened 2026-04-06 22:18:03 -06:00 by archeious · 0 comments

archeious commented

2026-04-06 22:18:03 -06:00

Owner

Background

#7 introduced a gate that skips the survey pass when a target has both < _SURVEY_MIN_FILES files AND < _SURVEY_MIN_DIRS directories. The thresholds shipped as _SURVEY_MIN_FILES = 5 and _SURVEY_MIN_DIRS = 2, picked from the example in #7's body without any empirical basis. The AND semantics correctly handle the deep-narrow edge case (few files, many dirs — survey still runs because dir count amortizes the cost across many dir loops).

What this issue is for

After Phase 2 ships and we have run --ai on a variety of real targets, revisit whether the thresholds and gate logic are actually pulling their weight.

Questions to answer with data

Are we skipping surveys we should be running? Look for runs where the survey was skipped and the dir loop produced a vague or wrong description. If those exist, the threshold is too aggressive.
Are we running surveys we should be skipping? Look for runs where the survey was called on a target small enough that the dir loop would have figured it out instantly. If those exist, the threshold is too conservative — survey is wasted spend.
Is file count the right input? File count from report["file_categories"] includes binary, generated, and pycache files. A target with 200 .pyc files and 3 source files looks large by this measure but is small in any meaningful sense. Should we exclude unknown and certain categories from the count? (Note: this overlaps with #42 — fixing the classifier would change what counts.)
Is dir count the right input? Currently uses len(_discover_directories(...)), which is post-exclude. Good. But it counts all descendants equally — a 50-dir-deep linear chain looks the same as 50 sibling dirs even though their dir loops behave very differently. Worth considering.
Should the gate consider total bytes too? A target with 5 files where one is 50 MB is not the same as 5 small files. Probably edge case, but worth checking.
Should the gate be one constant or scale? Right now _SURVEY_MIN_FILES and _SURVEY_MIN_DIRS are separate constants checked with AND. A single "survey value score" combining files, dirs, and maybe bytes might be more honest than two thresholds.

Acceptance

Decision documented (in this issue or PLAN.md): keep current thresholds, change them, or replace the gate logic entirely
If thresholds change, the new values are justified by examples from real runs, not vibes
If the gate is replaced, the new logic also handles the deep-narrow edge case correctly

Sequencing

After Phase 2 ships (#4–#7 plus #42, #44) and after we have run --ai on at least 5 distinct real targets of varying shapes — including at least one that triggers the skip and one deep-narrow case. Probably end of Phase 2 retrospective or start of Phase 3.

## Background #7 introduced a gate that skips the survey pass when a target has both `< _SURVEY_MIN_FILES` files AND `< _SURVEY_MIN_DIRS` directories. The thresholds shipped as `_SURVEY_MIN_FILES = 5` and `_SURVEY_MIN_DIRS = 2`, picked from the example in #7's body without any empirical basis. The AND semantics correctly handle the deep-narrow edge case (few files, many dirs — survey still runs because dir count amortizes the cost across many dir loops). ## What this issue is for After Phase 2 ships and we have run `--ai` on a variety of real targets, revisit whether the thresholds and gate logic are actually pulling their weight. ## Questions to answer with data 1. **Are we skipping surveys we should be running?** Look for runs where the survey was skipped and the dir loop produced a vague or wrong description. If those exist, the threshold is too aggressive. 2. **Are we running surveys we should be skipping?** Look for runs where the survey was called on a target small enough that the dir loop would have figured it out instantly. If those exist, the threshold is too conservative — survey is wasted spend. 3. **Is file count the right input?** File count from `report["file_categories"]` includes binary, generated, and pycache files. A target with 200 `.pyc` files and 3 source files looks large by this measure but is small in any meaningful sense. Should we exclude `unknown` and certain categories from the count? (Note: this overlaps with #42 — fixing the classifier would change what counts.) 4. **Is dir count the right input?** Currently uses `len(_discover_directories(...))`, which is post-exclude. Good. But it counts all descendants equally — a 50-dir-deep linear chain looks the same as 50 sibling dirs even though their dir loops behave very differently. Worth considering. 5. **Should the gate consider total bytes too?** A target with 5 files where one is 50 MB is not the same as 5 small files. Probably edge case, but worth checking. 6. **Should the gate be one constant or scale?** Right now `_SURVEY_MIN_FILES` and `_SURVEY_MIN_DIRS` are separate constants checked with AND. A single "survey value score" combining files, dirs, and maybe bytes might be more honest than two thresholds. ## Acceptance - Decision documented (in this issue or PLAN.md): keep current thresholds, change them, or replace the gate logic entirely - If thresholds change, the new values are justified by examples from real runs, not vibes - If the gate is replaced, the new logic also handles the deep-narrow edge case correctly ## Sequencing After Phase 2 ships (#4–#7 plus #42, #44) and after we have run `--ai` on at least 5 distinct real targets of varying shapes — including at least one that triggers the skip and one deep-narrow case. Probably end of Phase 2 retrospective or start of Phase 3.