docs(plan): add Phase 4.5 (#48) and end-of-project #49

#48 captures the unit-of-analysis problem: "file" is the wrong unit
for containers (mbox, SQLite, zip, notebooks) and dense directories
(Maildir, .git, node_modules). Sequenced after Phase 4 as its own
phase since it requires format detection and container handlers.

#49 captures the smaller follow-up that the terminal report still
shows the biased bucketed view. Deferred to end-of-project tuning.
This commit is contained in:
Jeff Smith 2026-04-06 22:31:41 -06:00
parent 6cda1cc521
commit 55da7fa8dc

16
PLAN.md
View file

@ -582,6 +582,17 @@ architecture. The migration pain is intentional and instructive.
- `--no-external` flag to disable network tools
- Budget tracking and logging
### Phase 4.5 — Unit of analysis (#48)
- "File" is hardcoded as the unit everywhere. Maildirs over-count
(one mailbox = thousands of files), mbox/SQLite/zip/notebooks
under-count (one file = many logical units). Format detection,
container handlers, and a unified "logical unit" abstraction
across filetypes/cache/report/ai. The `filetypes.py` rename
happens here as part of the substantive change, not as a
cosmetic relabel. Sequenced after Phase 4 because it overlaps
with format inspection and is substantial enough to be its own
phase.
### Phase 5 — Scale-tiered synthesis
- Sizing measurement after dir loops
- Tier classification
@ -608,6 +619,11 @@ architecture. The migration pain is intentional and instructive.
- Domain-appropriate section headers
### End-of-project tuning
- **Honest terminal report file-type view (#49)** — the report still
shows the bucketed `summarize_categories()` view, which collapses
`.pyc` and other generated files into `unknown`. After #42 ships
the survey gets honest signals; the report can follow with an
extension sub-section or similar. Low priority, not blocking.
- **Revisit survey-skip thresholds (#46)**`_SURVEY_MIN_FILES` and
`_SURVEY_MIN_DIRS` shipped with values from #7's example, no
empirical basis. Once `--ai` has been run on a variety of real