docs(plan): add Phase 4.5 (#48) and end-of-project #49

#48 captures the unit-of-analysis problem: "file" is the wrong unit for containers (mbox, SQLite, zip, notebooks) and dense directories (Maildir, .git, node_modules). Sequenced after Phase 4 as its own phase since it requires format detection and container handlers. #49 captures the smaller follow-up that the terminal report still shows the biased bucketed view. Deferred to end-of-project tuning.
2026-04-06 22:31:41 -06:00 · 2026-04-06 22:31:41 -06:00 · 55da7fa8dc
commit 55da7fa8dc
parent 6cda1cc521
1 changed files with 16 additions and 0 deletions
--- a/PLAN.md
+++ b/PLAN.md
@ -582,6 +582,17 @@ architecture. The migration pain is intentional and instructive.
 - `--no-external` flag to disable network tools
 - Budget tracking and logging
 ### Phase 4.5 — Unit of analysis (#48)
 - "File" is hardcoded as the unit everywhere. Maildirs over-count
  (one mailbox = thousands of files), mbox/SQLite/zip/notebooks
  under-count (one file = many logical units). Format detection,
  container handlers, and a unified "logical unit" abstraction
  across filetypes/cache/report/ai. The `filetypes.py` rename
  happens here as part of the substantive change, not as a
  cosmetic relabel. Sequenced after Phase 4 because it overlaps
  with format inspection and is substantial enough to be its own
  phase.
 ### Phase 5 — Scale-tiered synthesis
 - Sizing measurement after dir loops
 - Tier classification
@ -608,6 +619,11 @@ architecture. The migration pain is intentional and instructive.
 - Domain-appropriate section headers
 ### End-of-project tuning
 - **Honest terminal report file-type view (#49)** — the report still
  shows the bucketed `summarize_categories()` view, which collapses
  `.pyc` and other generated files into `unknown`. After #42 ships
  the survey gets honest signals; the report can follow with an
  extension sub-section or similar. Low priority, not blocking.
 - **Revisit survey-skip thresholds (#46)** — `_SURVEY_MIN_FILES` and
  `_SURVEY_MIN_DIRS` shipped with values from #7's example, no
  empirical basis. Once `--ai` has been run on a variety of real