#48 captures the unit-of-analysis problem: "file" is the wrong unit for containers (mbox, SQLite, zip, notebooks) and dense directories (Maildir, .git, node_modules). Sequenced after Phase 4 as its own phase since it requires format detection and container handlers. #49 captures the smaller follow-up that the terminal report still shows the biased bucketed view. Deferred to end-of-project tuning.
This commit is contained in:
parent
6cda1cc521
commit
55da7fa8dc
1 changed files with 16 additions and 0 deletions
16
PLAN.md
16
PLAN.md
|
|
@ -582,6 +582,17 @@ architecture. The migration pain is intentional and instructive.
|
||||||
- `--no-external` flag to disable network tools
|
- `--no-external` flag to disable network tools
|
||||||
- Budget tracking and logging
|
- Budget tracking and logging
|
||||||
|
|
||||||
|
### Phase 4.5 — Unit of analysis (#48)
|
||||||
|
- "File" is hardcoded as the unit everywhere. Maildirs over-count
|
||||||
|
(one mailbox = thousands of files), mbox/SQLite/zip/notebooks
|
||||||
|
under-count (one file = many logical units). Format detection,
|
||||||
|
container handlers, and a unified "logical unit" abstraction
|
||||||
|
across filetypes/cache/report/ai. The `filetypes.py` rename
|
||||||
|
happens here as part of the substantive change, not as a
|
||||||
|
cosmetic relabel. Sequenced after Phase 4 because it overlaps
|
||||||
|
with format inspection and is substantial enough to be its own
|
||||||
|
phase.
|
||||||
|
|
||||||
### Phase 5 — Scale-tiered synthesis
|
### Phase 5 — Scale-tiered synthesis
|
||||||
- Sizing measurement after dir loops
|
- Sizing measurement after dir loops
|
||||||
- Tier classification
|
- Tier classification
|
||||||
|
|
@ -608,6 +619,11 @@ architecture. The migration pain is intentional and instructive.
|
||||||
- Domain-appropriate section headers
|
- Domain-appropriate section headers
|
||||||
|
|
||||||
### End-of-project tuning
|
### End-of-project tuning
|
||||||
|
- **Honest terminal report file-type view (#49)** — the report still
|
||||||
|
shows the bucketed `summarize_categories()` view, which collapses
|
||||||
|
`.pyc` and other generated files into `unknown`. After #42 ships
|
||||||
|
the survey gets honest signals; the report can follow with an
|
||||||
|
extension sub-section or similar. Low priority, not blocking.
|
||||||
- **Revisit survey-skip thresholds (#46)** — `_SURVEY_MIN_FILES` and
|
- **Revisit survey-skip thresholds (#46)** — `_SURVEY_MIN_FILES` and
|
||||||
`_SURVEY_MIN_DIRS` shipped with values from #7's example, no
|
`_SURVEY_MIN_DIRS` shipped with values from #7's example, no
|
||||||
empirical basis. Once `--ai` has been run on a variety of real
|
empirical basis. Once `--ai` has been run on a variety of real
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue