retro: Session 9 — scope shift + all Phase 3 prereqs shipped

Jeff Smith 2026-04-11 11:02:16 -06:00
parent 508d66cba8
commit 31a052eca0
2 changed files with 334 additions and 46 deletions

@ -1,65 +1,353 @@
# Session 9 # Session 9 Notes — 2026-04-11
**Date:** 2026-04-11 ## What We Set Out to Do
**Focus:** Scope shift — AI investigation is the product, drop zero-dependency constraint, delete watch mode (#64)
**Duration estimate:** ~45 minutes
## What was done Open question. The session started with a State-of-the-App summary
request, which surfaced two threads: (1) a scope shift the user had
been mulling, and (2) the existing Phase 3 prerequisites (#55, #56,
#57) blocking Phase 3 proper. Neither was named as the goal up front
— they emerged from the conversation and stacked.
A coordinated scope change. Two original design constraints were dropped and one feature was deleted: What we shipped, in execution order:
1. **Zero-dependency Python CLI is no longer a goal.** Luminos installs from `requirements.txt` like a normal Python project. `anthropic`, `tree-sitter` + grammars, and `python-magic` are normal pip dependencies, not lazy imports gated by a CLI flag. 1. #64 — AI investigation is the product, drop zero-dep constraint, delete watch mode
2. **AI investigation is the headline.** The base scan exists to feed the agent. There is no `--ai` flag and no `--no-ai` mode. AI runs unconditionally on every invocation. 2. #57 — Refactor `_run_dir_loop` into three focused helpers
3. **Watch mode deleted.** A non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode comes back, it gets rebuilt as incremental AI re-investigation. 3. #56 — Single-source tool registration via `register_tool()`
4. #55 — Unit test coverage for pure helpers in `ai.py` (wave 1)
5. #70 — Test coverage wave 2: `_TokenTracker`, `_synthesize_from_cache`, `_discover_directories`
6. #72 — Document the leaf-first investigation contract in Internals.md
### Code Six issues, six PRs (or wiki commits), 234 tests, four open issues
closed plus the new ones, all in one continuous session.
- Deleted `luminos_lib/watch.py` and the `--watch` flag. ## What Actually Happened
- Deleted `luminos_lib/capabilities.py` and `tests/test_capabilities.py`. Moved `clear_cache()` into `cache.py`.
- `luminos.py`: removed `--watch`, `--ai`, `--install-extras`. Kept `--clear-cache`, `--fresh`, `-x`, `-d`, `-a`, `-o`, `--json`. AI runs unconditionally after the base scan. If `ANTHROPIC_API_KEY` is unset, exits 0 with a one-line hint *before* running the base scan.
- `ai.py`: dropped the `check_ai_dependencies()` call and the import.
- New `requirements.txt`. `setup_env.sh` installs from it.
### Docs ### Scope shift (#64)
- `README.md` rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line. The user opened with "I would like to make a couple scope changes" —
- `CLAUDE.md` (project): rewrites Key Constraints, updates module map and Running Luminos commands. drop the zero-dep constraint, make AI investigation the main show. We
- `PLAN.md`: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature. worked through the framing in conversation: option (a) AI-default
- Wiki: `Architecture.md`, `DevelopmentGuide.md`, `Home.md`, `Internals.md` updated. with `--no-ai` escape hatch, or option (b) AI-only with the base scan
purely internal. User picked (b). Watch mode was deleted as part of
the same change because a non-AI churn monitor conflicts with the new
philosophy.
### Ship Reading the code first turned up two things that made the change much
smaller than expected. First, `ai.py` and `ast_parser.py` already did
top-level imports of `anthropic`/`magic`/`tree_sitter` — the "lazy
deps" pattern lived only in `luminos.py`'s `if args.ai:` gate.
Removing that gate WAS the entire technical change. Second,
`capabilities.py` was almost dead weight: only `clear_cache()` was
load-bearing, and only because it knew about `CACHE_ROOT` (which
already lived in `cache.py`). One import move and the whole module
could be deleted.
- PR #65 merged via gitea MCP, branch deleted local + remote. PR #65 landed 11 file changes: deleted `watch.py`, deleted
- Issue #64 closed manually (not relying on `Closes #N`). `capabilities.py`, deleted `tests/test_capabilities.py`, moved
- Issue #35 (incremental AI re-investigation in watch mode) closed as obsolete with a comment explaining why the framing no longer fits. `clear_cache()` into `cache.py`, rewrote `luminos.py` to make AI
- Tests: 164 pass (down from 168 with the 4 removed capabilities tests). mandatory, dropped `check_ai_dependencies()` from `ai.py`, added
`requirements.txt`, updated `setup_env.sh`, and rewrote `README.md` /
`CLAUDE.md` / `PLAN.md` to match. Wiki updates landed in a separate
commit on the wiki repo. PR #66 was the matching session-log bump.
## Discoveries and observations The graceful exit case mattered enough to call out: first draft
checked `ANTHROPIC_API_KEY` *after* the base scan ran. That makes the
user wait through a multi-second scan only to be told they can't use
the result. Moved to top of `main()`, after target validation but
before `scan()`. Verified by running `unset ANTHROPIC_API_KEY &&
python3 luminos.py /tmp` and observing a clean exit 0 with the hint.
- **The "lazy import" pattern was thinner than expected.** `ai.py` and `ast_parser.py` already did top-level imports of `anthropic`/`magic`/`tree_sitter`. The "lazy" behavior lived only in `luminos.py`'s `if args.ai:` gate. So the actual code change for "drop the zero-dep constraint" was much smaller than the conceptual shift suggested: just remove the gate and accept that `luminos.py`'s AI import always fires. ### #57: dir loop refactor
- **`capabilities.py` was almost dead weight.** Only `clear_cache()` was load-bearing, and even that only because it knew about `CACHE_ROOT` (which already lives in `cache.py`). Moving `clear_cache()` to `cache.py` was a one-import simplification — `capabilities.py` was a tax we'd been paying for the lazy-deps story.
- **The graceful exit needed to fire *before* the base scan, not after.** First draft put the `ANTHROPIC_API_KEY` check after the scan ran. That makes the user wait through a multi-second scan only to get told they can't actually use the result. Moved it to the top of `main()` after target validation but before `scan()`.
## Decisions made and why After the scope change shipped, the natural next move was the Phase 3
prerequisites. Picked #57 first because it was the structural one
that everything else benefits from.
- **Option (b), not (a).** The choice was between "AI-default with `--no-ai` escape hatch" (a) and "AI-only, base scan is internal" (b). Picked (b). Reasoning: keeping `--no-ai` would have meant maintaining two CLI surfaces and two documentation paths for what the philosophy says is one product. The base scan is still useful internally (it produces the `report` dict the agent reads), it just doesn't need to be exposed as a standalone CLI mode. Cleaner story, less drift. `_run_dir_loop` was ~160 lines holding four conceptual layers:
- **Delete watch mode rather than park it.** It was ~110 lines, no tests, no users in our context. Parking it as a "scoped-down churn monitor" would have meant explaining in docs why one feature ignores the AI-first philosophy. Delete + a clear note in PLAN.md that watch comes back as incremental AI re-investigation if it comes back at all. pre-loop setup, budget check + partial-flush (~57 lines, the largest
- **Delete `--ai` cleanly, no deprecation cycle.** Per global CLAUDE.md ("don't use feature flags or backwards-compatibility shims when you can just change the code"). It's a personal project with no external users to deprecate against. single block), API call + response printing, and tool dispatch + done
- **Graceful exit on missing API key, exit 0 not exit 1.** "Missing API key" is a user-fixable configuration state, not an error condition. Exit 0 + hint reads as "here's what you need to do," not "something broke." detection. Phase 3 dynamic turn allocation will inject more state
- **One commit, not three.** The code changes and the doc changes are tightly coupled — splitting them creates a half-broken state in commit 1 where the code says one thing and the docs say another. The whole scope shift is one logical change. into the same code path, so the refactor had to land first.
## Raw thinking I read the code carefully before designing the helpers. The cleanest
split turned out to be three: `_build_dir_loop_context()` (pure
setup, returns a `_DirLoopContext` namedtuple), `_flush_partial_dir_entry()`
(idempotent partial-cache writer for the budget-exceeded path), and
`_handle_turn_response()` (per-turn response processing — print,
append, dispatch). The new `_run_dir_loop` body is ~25 lines.
- The fact that `ai.py` already did top-level `import anthropic` is interesting in hindsight. The lazy-deps story was load-bearing in the docs and the prompts but not really in the code. We were one CLI gate removal away from the dependency story being trivial, and we'd been paying the conceptual cost the whole time. Lesson: when a constraint feels heavier in docs than in code, check if the code is actually enforcing it. PR #67 shipped clean, 164 tests passed unchanged. Internals.md §4 was
- Watch mode being deleted feels right but is also a small loss of optionality. The next time someone says "I want to monitor this directory for changes," the answer is "luminos doesn't do that anymore, use `inotifywait`." That's fine for now — luminos was never the best churn monitor — but worth remembering if a real use case surfaces. updated to reflect the new structure and the file:line refs that
- The session was unusually fast (~45 min for a coordinated scope change spanning 11 files plus 5 wiki pages) because the user had already done the conceptual work in conversation. By the time I started cutting code, every decision had been pre-confirmed: option (b), delete watch, delete `--ai` cleanly, graceful exit. The TaskCreate breakdown (10 tasks) helped keep it linear, but the real speedup was that nothing was ambiguous when execution started. drifted.
- Phase 3 prerequisites (#55, #56, #57) are now genuinely the only thing between us and Phase 3 proper. The scope change didn't touch any of them. Next session can pick one.
## What's next ### #56: tool registration consolidation
The user explicitly chose "fix now" over "defer to Phase 3.5 when MCP
will replace this anyway." Reasoning: easier to migrate one
well-structured registry to MCP than to migrate two parallel
structures.
Before designing, I had to recognize a complication: the existing
tool tables aren't a single registry. There are three scope lists
(`_DIR_TOOLS`, `_SYNTHESIS_TOOLS`, `_SURVEY_TOOLS`) AND one global
`_TOOL_DISPATCH` dict. Some tools (like `flag`) appear in multiple
scopes with the same schema. Some tools (like `submit_report`) appear
in multiple scopes with *different* schemas. And `submit_report` /
`submit_survey` have schemas but no dispatch entry because the loop
body intercepts them.
Final design: a `register_tool(name, description, schema, scopes,
handler=None)` function. Single source of truth per (tool, scope)
pair. Tools in multiple scopes get multiple `register_tool()` calls
to preserve order (otherwise the order in the second scope drifts
relative to other tools).
PR #68 was 399 insertions / 344 deletions. Runtime introspection
confirmed identical scope contents and identical 10-entry dispatch
table. 164 tests still passed unchanged. Internals.md §4.2 and §9.1
shrunk: §9.1 went from a 5-step "don't forget the second half"
process to a 4-step process with one obvious place to look.
### #55: pure-helper test coverage, wave 1
The user said "No one likes doing tests but we need them." Picked the
issue's seven targets and added one bonus from #57
(`_flush_partial_dir_entry`).
Used the `_make_manager()` pattern from `tests/test_cache.py` to
construct a `_CacheManager` rooted in a tempdir, sidestepping
`CACHE_ROOT` entirely. 45 tests across 8 helpers. One test had a
typo in an asserted substring on the first run — the actual partial
reason string is "context budget reached before files processed", not
"before any files" — caught and fixed in 30 seconds. 209 total tests
after PR #69.
The two notable behaviors pinned: `_filter_dir_tools` threshold gate
is strict `<` (the boundary case where confidence equals the
threshold passes the gate), and `_path_is_safe` correctly rejects
sibling-with-target-prefix (`/tmp/foo` vs `/tmp/foo_sibling` — the
easy-to-miss path traversal case).
### #70: pure-helper test coverage, wave 2
I noticed the wave 1 picks left out three high-impact helpers:
`_TokenTracker`, `_synthesize_from_cache`, `_discover_directories`.
Pitched them as "low effort, high impact." User agreed and asked me
to file an issue, insert it into the roadmap before Phase 3, and
ship.
Reading `_TokenTracker` corrected my issue draft: I had written
`reset_loop()` "preserves last_input" — actually it zeroes
`last_input` along with the loop counters. The test pins the real
behavior. I also discovered the `record()` method (not
`record_usage()` as I'd written in the issue), and that
`SimpleNamespace` works as a fake usage object because the function
uses `getattr(usage, "input_tokens", 0)`.
The load-bearing test in this batch is the budget-exceeded check
under cumulative-input pressure: record 10 calls each with
`input_tokens = CONTEXT_BUDGET // 5`, so total cumulative is 2x the
budget but `last_input` stays at 1/5 of budget. Assert that
`budget_exceeded()` returns False. This is exactly the #44 fix
condition — if anyone regresses to "exceeded if cumulative > budget,"
this test screams.
`_synthesize_from_cache` only reads dir entries (not file entries) —
worth pinning explicitly so a future maintainer doesn't add file
entries thinking they should appear in the fallback report.
`_discover_directories` tests now pin: leaves-first ordering, skip
list (`.git`, `__pycache__`, `node_modules`, `*.egg-info`), custom
exclude, hidden dirs by default, and the subtle `show_hidden=True`
case where the skip list still applies (`.git` stays out even with
hidden visible).
PR #71 added 25 tests, 234 total. PLAN.md got restructured: new
Phase 2.7 (#56 ✅) and Phase 2.8 (#55 ✅, #70) entries, the stale
Phase 3.4 (#56) and "Background chore" (#55) sections deleted since
they were displaced by the pre-Phase-3 cleanup pattern.
### Phase 3 prep recommendations and #72
After the four pre-reqs were done, the user asked what else I'd
recommend before starting Phase 3 ("phase 3 is a biggie and I want it
to have a solid base"). I came back with three picks: end-to-end
smoke test, design sketch for the planning pass, and document the
leaf-first contract.
User responded: smoke test already done externally (looks fine);
design sketch deferred to Phase 3 task 1 (intent matched, timing
disagreement); leaf-first contract — make it so.
The leaf-first contract issue (#72) is wiki-only, no code. Added a
new §4.7 to Internals.md explaining that `_discover_directories()`
returns leaves-first as a load-bearing invariant, that
`_get_child_summaries()` silently depends on it, and that the
`(none — this is a leaf directory)` placeholder LIES if the children
just haven't been investigated yet — the agent has no way to know.
Two safe paths if Phase 3 changes the order: preserve leaf-first
within priority bands, or rewrite the placeholder to be honest. First
draft accidentally inserted §4.7 before §4.6 in the file; caught on
re-read, swapped, committed.
## Key Decisions & Reasoning
- **Scope shift went with option (b), not (a).** AI-only with the
base scan purely internal. Reasoning: keeping `--no-ai` would have
meant maintaining two CLI surfaces and two documentation paths for
what the philosophy says is one product. Cleaner story.
- **Delete watch mode rather than park it.** Parking would have
required explaining in docs why one feature ignored the AI-first
philosophy. PLAN.md already notes that watch comes back as
incremental AI re-investigation if it comes back at all.
- **Delete `--ai` cleanly, no deprecation.** Per global CLAUDE.md
("no backwards-compat shims when you can just change the code").
Personal project, no external users to deprecate against.
- **Graceful exit on missing API key, exit 0 not exit 1.** Missing
key is a user-fixable configuration state, not an error. Exit 0 +
hint reads as "here's what you need to do," not "something broke."
- **Fix #56 now rather than defer to Phase 3.5.** User chose this
explicitly. The structure introduced (one registry call per (tool,
scope) pair) is naturally MCP-shaped, so the eventual MCP migration
collapses to "replace `register_tool()` with a server call."
- **Test coverage in two waves rather than one batch.** Wave 1 (#55)
shipped first with the issue's stated targets. Then I noticed three
more high-impact helpers were uncovered, pitched them, and the user
greenlit a wave 2 (#70). Splitting kept each PR cohesive and
reviewable.
- **Phase 3 design sketch deferred to Phase 3 task 1.** I recommended
it as Phase 3 prep. User overrode: "agree on intent, disagree on
timing." Result: the design sketch is now bookkept as the first
thing Phase 3 does, not as a separate prep cycle. Cleaner if Phase
3 has the design fresh in mind when the rest of the work starts.
- **One commit per PR for the scope change**, not split into code +
docs commits. The two are tightly coupled — splitting would create
a half-broken state in commit 1 where code says one thing and docs
say another. Same logical change.
## Surprises & Discoveries
- **The lazy-deps story was thinner than expected.** `ai.py` and
`ast_parser.py` already did top-level imports of the AI packages.
The "lazy" pattern lived only in the CLI gate. Removing the gate
WAS the technical change for the entire scope shift. Lesson: when
a constraint feels heavier in docs than in code, check whether the
code is actually enforcing it.
- **`capabilities.py` was almost dead weight.** Only `clear_cache()`
was load-bearing, and even that only because of the `CACHE_ROOT`
reference. We'd been paying a tax for the lazy-deps story that the
code wasn't actually charging.
- **`_TOOL_DISPATCH` and `_DIR_TOOLS` had a name collision case.**
`submit_report` appears in both `_DIR_TOOLS` and `_SYNTHESIS_TOOLS`
with different schemas. The new registry handles this with two
`register_tool()` calls per scope, but the existence of the
collision wasn't obvious until I read the code.
- **`_TokenTracker.reset_loop()` zeroes `last_input`.** My #70 issue
draft assumed it preserved `last_input` across resets. The actual
code doesn't. Reading the code corrected the test plan before any
test was wrong. Always read the code before writing the spec.
- **`_synthesize_from_cache` reads dir entries only.** I had assumed
it would also pull file entries in some "even more degraded" case.
It doesn't. The fallback is dir-only or nothing.
- **The graceful exit had to fire before the base scan, not after.**
First draft put it after. Caught it in the writing stage, not in
testing — but worth noting because the same pattern can sneak into
other early-exit checks.
## Concerns & Open Threads
- **Phase 3 design sketch is bookkept as Phase 3 task 1, not done
yet.** This is the highest-priority unresolved thread. The
planning pass touches many things (cache schema, dir loop
orchestration, max_turns propagation, plan persistence, survey
interaction, resume semantics, optional global token budget) and
hand-rolling the design while implementing leads to drift. Make
sure Phase 3 actually starts with the design sketch.
- **The leaf-first contract is documented but only loosely
enforced.** `TestDiscoverDirectories` pins the *ordering*, but
there's no test that asserts "dirs are processed in the order they
come out of `_discover_directories`" — the orchestrator could
re-sort silently and the test wouldn't catch it. Phase 3 will
introduce alternative orderings; this gap matters.
- **Token budget arithmetic for Phase 3 is still a known unknown.**
PLAN.md flags it: "How does the agent 'request more turns'?" The
current `_TokenTracker` is per-loop with grand totals for cost.
There's no concept of "we've spent X out of Y on this whole
investigation." If Phase 3 dynamic turn allocation needs that,
it has to grow it explicitly.
- **No live integration smoke test from this session.** The user ran
one externally and confirmed it works, but the assistant didn't
observe it. If a regression slipped through, we'd find out at the
start of Phase 3 or later. The unit tests are 234 strong but they
don't cover the full pipeline end-to-end.
- **Six PRs in one session is a lot of merge commits on main.** Not
a problem per se, but if a regression bisects to "somewhere in
Session 9" the bisect surface is wider than usual. Worth noting
for the next session retro.
- **Wiki-only changes (#72) work fine via direct commits to wiki
main.** The pattern is established; future doc-only work can
follow it without ceremony.
## Raw Thinking
- The pre-Phase-3 cleanup pattern (#54 → #57#56#55#70
#72) is worth naming as a paradigm: "pay debts in the area before
adding new state to that area." Phase 2.6, 2.7, 2.8 in PLAN.md
reflect this. Could be applied generally to any large milestone:
inventory the helpers it'll touch, refactor + test them first,
then add the new work on top of a known-good foundation.
- The State-of-the-App summary at session start was useful framing.
It surfaced which threads were on the table, which were blocked,
and which had decision points pending. Worth doing more often,
especially at the start of long sessions or sessions that start
with "what's left."
- `_TokenTracker` test count (11) was higher than I initially
scoped. Once I started enumerating edge cases (boundary, defaults,
multiple loops, reset semantics, the load-bearing #44 case) the
count grew naturally. Good unit tests don't shrink. They accrete.
- The `register_tool()` design is naturally MCP-shaped. A registry
of `(name, schema, scopes, handler)` is exactly what an MCP tool
list looks like. When Phase 3.5 lands, `register_tool()` can
collapse to a one-line forward to the connected MCP server's
`tools/list` response, and the migration touches almost nothing
else. This was unintentional but lucky.
- The session was unusually productive — 6 PRs, 5 issues filed,
4 issues closed, 70 net new tests, 4 wiki page updates — because
each piece of work unblocked the next and the user kept the
decisions decisive. The TaskCreate breakdowns helped, but the
real speedup was that nothing was ambiguous when execution
started. When the user redirects with a single sentence ("fix
now," "delete it," "make it so"), the loop doesn't have to stop
to re-confirm.
- "Documentation is work" — #72 was a quick experiment in shipping a
doc-only issue with the same workflow as code. Worked fine.
Pattern is repeatable for other cross-cutting concerns: contracts,
invariants, design decisions that aren't enforced anywhere except
in human heads.
## What's Next
In priority order: In priority order:
1. **#57** — refactor `_run_dir_loop` before Phase 3 dynamic turn allocation lands. Prerequisite cleanup, small. 1. **Phase 3 task 1: write the planning pass design sketch.**
2. **#56** — dedupe `_TOOL_DISPATCH` / `_DIR_TOOLS` registration. Decision point: fix now or let it die in Phase 3.5 MCP migration. Deferred from this session. ~30-45 minutes, no code. Cover the
3. **#55** — unit test coverage for ai.py pure helpers. Foundation for Phase 3 confidence work. `submit_plan` schema, plan storage in cache, `max_turns`
4. Phase 3 proper (#19#29 cluster). propagation, skip-dir semantics, survey-output integration,
resume semantics, and the optional global token budget question.
Land in PLAN.md or a new wiki page before any Phase 3 code is
cut.
2. **Phase 3 implementation: #19#29 cluster.** Planning pass after
survey, before dir loops; `submit_plan` tool; dynamic turn
allocation based on plan output; dir loop orchestrator updated
to follow the plan. Multi-PR, probably multi-session.
3. **Phase 3.5: MCP backend abstraction (#39).** The pivot point.
After Phase 3 is working, before Phase 4. The `register_tool()`
refactor from #56 makes this much easier than it would have been.
4. **Phase 4+: external knowledge tools, scale-tiered synthesis,
hypothesis-driven synthesis, refinement, dynamic report
structure.** The full backlog from PLAN.md.
When Phase 3 starts: re-read PLAN.md Part 4 (Investigation Planning)
and Internals.md §4.7 (the leaf-first contract) before designing.
The contract WILL be tempting to violate; the design sketch has to
address it explicitly.

@ -10,7 +10,7 @@
| [Session 6](Session6) | 2026-04-07 | Extracted shared workflow/branching/protocols from project CLAUDE.md to global `~/.claude/CLAUDE.md`; moved externalize.md and wrap-up.md to `~/.claude/protocols/` | | [Session 6](Session6) | 2026-04-07 | Extracted shared workflow/branching/protocols from project CLAUDE.md to global `~/.claude/CLAUDE.md`; moved externalize.md and wrap-up.md to `~/.claude/protocols/` |
| [Session 7](Session7) | 2026-04-07 | Phase 1 audit (#1 closed, only #54 remains); gitea MCP credential overhaul — dedicated `claude-code` Forgejo user with admin on luminos, write+delete verified | | [Session 7](Session7) | 2026-04-07 | Phase 1 audit (#1 closed, only #54 remains); gitea MCP credential overhaul — dedicated `claude-code` Forgejo user with admin on luminos, write+delete verified |
| [Session 8](Session8) | 2026-04-07 | Closed #54 — added confidence/confidence_reason to write_cache tool schema description; Phase 1 milestone now 4/4 complete | | [Session 8](Session8) | 2026-04-07 | Closed #54 — added confidence/confidence_reason to write_cache tool schema description; Phase 1 milestone now 4/4 complete |
| [Session 9](Session9) | 2026-04-11 | Scope shift (#64): AI investigation is the product, zero-dep constraint dropped, watch mode + capabilities.py deleted, requirements.txt added, README/CLAUDE/PLAN/wiki rewritten | | [Session 9](Session9) | 2026-04-11 | Scope shift (#64) + all Phase 3 prereqs: dir loop refactor (#57), tool registry consolidation (#56), pure-helper test coverage waves 1+2 (#55, #70), leaf-first contract docs (#72). 6 PRs, 70 new tests (164→234), Phase 2.6/2.7/2.8 milestones complete |
--- ---