From 55da7fa8dcc76c20592eef68572a7a29b449a163 Mon Sep 17 00:00:00 2001 From: Jeff Smith Date: Mon, 6 Apr 2026 22:31:41 -0600 Subject: [PATCH] docs(plan): add Phase 4.5 (#48) and end-of-project #49 #48 captures the unit-of-analysis problem: "file" is the wrong unit for containers (mbox, SQLite, zip, notebooks) and dense directories (Maildir, .git, node_modules). Sequenced after Phase 4 as its own phase since it requires format detection and container handlers. #49 captures the smaller follow-up that the terminal report still shows the biased bucketed view. Deferred to end-of-project tuning. --- PLAN.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/PLAN.md b/PLAN.md index 361121b..6e8a148 100644 --- a/PLAN.md +++ b/PLAN.md @@ -582,6 +582,17 @@ architecture. The migration pain is intentional and instructive. - `--no-external` flag to disable network tools - Budget tracking and logging +### Phase 4.5 — Unit of analysis (#48) +- "File" is hardcoded as the unit everywhere. Maildirs over-count + (one mailbox = thousands of files), mbox/SQLite/zip/notebooks + under-count (one file = many logical units). Format detection, + container handlers, and a unified "logical unit" abstraction + across filetypes/cache/report/ai. The `filetypes.py` rename + happens here as part of the substantive change, not as a + cosmetic relabel. Sequenced after Phase 4 because it overlaps + with format inspection and is substantial enough to be its own + phase. + ### Phase 5 — Scale-tiered synthesis - Sizing measurement after dir loops - Tier classification @@ -608,6 +619,11 @@ architecture. The migration pain is intentional and instructive. - Domain-appropriate section headers ### End-of-project tuning +- **Honest terminal report file-type view (#49)** — the report still + shows the bucketed `summarize_categories()` view, which collapses + `.pyc` and other generated files into `unknown`. After #42 ships + the survey gets honest signals; the report can follow with an + extension sub-section or similar. Low priority, not blocking. - **Revisit survey-skip thresholds (#46)** — `_SURVEY_MIN_FILES` and `_SURVEY_MIN_DIRS` shipped with values from #7's example, no empirical basis. Once `--ai` has been run on a variety of real