retro: Session 9 — scope shift + all Phase 3 prereqs shipped

2026-04-11 11:02:16 -06:00 · 2026-04-11 11:02:16 -06:00 · 31a052eca0
commit 31a052eca0
parent 508d66cba8
2 changed files with 334 additions and 46 deletions
--- a/Session9.md
+++ b/Session9.md
@ -1,65 +1,353 @@
-# Session 9
+# Session 9 Notes — 2026-04-11
-**Date:** 2026-04-11
+## What We Set Out to Do
 **Focus:** Scope shift — AI investigation is the product, drop zero-dependency constraint, delete watch mode (#64)
 **Duration estimate:** ~45 minutes
-## What was done
+Open question. The session started with a State-of-the-App summary
 request, which surfaced two threads: (1) a scope shift the user had
 been mulling, and (2) the existing Phase 3 prerequisites (#55, #56,
 #57) blocking Phase 3 proper. Neither was named as the goal up front
 — they emerged from the conversation and stacked.
-A coordinated scope change. Two original design constraints were dropped and one feature was deleted:
+What we shipped, in execution order:
-1. **Zero-dependency Python CLI is no longer a goal.** Luminos installs from `requirements.txt` like a normal Python project. `anthropic`, `tree-sitter` + grammars, and `python-magic` are normal pip dependencies, not lazy imports gated by a CLI flag.
+1. #64 — AI investigation is the product, drop zero-dep constraint, delete watch mode
-2. **AI investigation is the headline.** The base scan exists to feed the agent. There is no `--ai` flag and no `--no-ai` mode. AI runs unconditionally on every invocation.
+2. #57 — Refactor `_run_dir_loop` into three focused helpers
-3. **Watch mode deleted.** A non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode comes back, it gets rebuilt as incremental AI re-investigation.
+3. #56 — Single-source tool registration via `register_tool()`
 4. #55 — Unit test coverage for pure helpers in `ai.py` (wave 1)
 5. #70 — Test coverage wave 2: `_TokenTracker`, `_synthesize_from_cache`, `_discover_directories`
 6. #72 — Document the leaf-first investigation contract in Internals.md
-### Code
+Six issues, six PRs (or wiki commits), 234 tests, four open issues
 closed plus the new ones, all in one continuous session.
- Deleted `luminos_lib/watch.py` and the `--watch` flag.
+## What Actually Happened
 - Deleted `luminos_lib/capabilities.py` and `tests/test_capabilities.py`. Moved `clear_cache()` into `cache.py`.
 - `luminos.py`: removed `--watch`, `--ai`, `--install-extras`. Kept `--clear-cache`, `--fresh`, `-x`, `-d`, `-a`, `-o`, `--json`. AI runs unconditionally after the base scan. If `ANTHROPIC_API_KEY` is unset, exits 0 with a one-line hint *before* running the base scan.
 - `ai.py`: dropped the `check_ai_dependencies()` call and the import.
 - New `requirements.txt`. `setup_env.sh` installs from it.
-### Docs
+### Scope shift (#64)
- `README.md` rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line.
+The user opened with "I would like to make a couple scope changes" —
- `CLAUDE.md` (project): rewrites Key Constraints, updates module map and Running Luminos commands.
+drop the zero-dep constraint, make AI investigation the main show. We
- `PLAN.md`: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature.
+worked through the framing in conversation: option (a) AI-default
- Wiki: `Architecture.md`, `DevelopmentGuide.md`, `Home.md`, `Internals.md` updated.
+with `--no-ai` escape hatch, or option (b) AI-only with the base scan
 purely internal. User picked (b). Watch mode was deleted as part of
 the same change because a non-AI churn monitor conflicts with the new
 philosophy.
-### Ship
+Reading the code first turned up two things that made the change much
 smaller than expected. First, `ai.py` and `ast_parser.py` already did
 top-level imports of `anthropic`/`magic`/`tree_sitter` — the "lazy
 deps" pattern lived only in `luminos.py`'s `if args.ai:` gate.
 Removing that gate WAS the entire technical change. Second,
 `capabilities.py` was almost dead weight: only `clear_cache()` was
 load-bearing, and only because it knew about `CACHE_ROOT` (which
 already lived in `cache.py`). One import move and the whole module
 could be deleted.
- PR #65 merged via gitea MCP, branch deleted local + remote.
+PR #65 landed 11 file changes: deleted `watch.py`, deleted
- Issue #64 closed manually (not relying on `Closes #N`).
+`capabilities.py`, deleted `tests/test_capabilities.py`, moved
- Issue #35 (incremental AI re-investigation in watch mode) closed as obsolete with a comment explaining why the framing no longer fits.
+`clear_cache()` into `cache.py`, rewrote `luminos.py` to make AI
- Tests: 164 pass (down from 168 with the 4 removed capabilities tests).
+mandatory, dropped `check_ai_dependencies()` from `ai.py`, added
 `requirements.txt`, updated `setup_env.sh`, and rewrote `README.md` /
 `CLAUDE.md` / `PLAN.md` to match. Wiki updates landed in a separate
 commit on the wiki repo. PR #66 was the matching session-log bump.
-## Discoveries and observations
+The graceful exit case mattered enough to call out: first draft
 checked `ANTHROPIC_API_KEY` *after* the base scan ran. That makes the
 user wait through a multi-second scan only to be told they can't use
 the result. Moved to top of `main()`, after target validation but
 before `scan()`. Verified by running `unset ANTHROPIC_API_KEY &&
 python3 luminos.py /tmp` and observing a clean exit 0 with the hint.
- **The "lazy import" pattern was thinner than expected.** `ai.py` and `ast_parser.py` already did top-level imports of `anthropic`/`magic`/`tree_sitter`. The "lazy" behavior lived only in `luminos.py`'s `if args.ai:` gate. So the actual code change for "drop the zero-dep constraint" was much smaller than the conceptual shift suggested: just remove the gate and accept that `luminos.py`'s AI import always fires.
+### #57: dir loop refactor
 - **`capabilities.py` was almost dead weight.** Only `clear_cache()` was load-bearing, and even that only because it knew about `CACHE_ROOT` (which already lives in `cache.py`). Moving `clear_cache()` to `cache.py` was a one-import simplification — `capabilities.py` was a tax we'd been paying for the lazy-deps story.
 - **The graceful exit needed to fire *before* the base scan, not after.** First draft put the `ANTHROPIC_API_KEY` check after the scan ran. That makes the user wait through a multi-second scan only to get told they can't actually use the result. Moved it to the top of `main()` after target validation but before `scan()`.
-## Decisions made and why
+After the scope change shipped, the natural next move was the Phase 3
 prerequisites. Picked #57 first because it was the structural one
 that everything else benefits from.
- **Option (b), not (a).** The choice was between "AI-default with `--no-ai` escape hatch" (a) and "AI-only, base scan is internal" (b). Picked (b). Reasoning: keeping `--no-ai` would have meant maintaining two CLI surfaces and two documentation paths for what the philosophy says is one product. The base scan is still useful internally (it produces the `report` dict the agent reads), it just doesn't need to be exposed as a standalone CLI mode. Cleaner story, less drift.
+`_run_dir_loop` was ~160 lines holding four conceptual layers:
- **Delete watch mode rather than park it.** It was ~110 lines, no tests, no users in our context. Parking it as a "scoped-down churn monitor" would have meant explaining in docs why one feature ignores the AI-first philosophy. Delete + a clear note in PLAN.md that watch comes back as incremental AI re-investigation if it comes back at all.
+pre-loop setup, budget check + partial-flush (~57 lines, the largest
- **Delete `--ai` cleanly, no deprecation cycle.** Per global CLAUDE.md ("don't use feature flags or backwards-compatibility shims when you can just change the code"). It's a personal project with no external users to deprecate against.
+single block), API call + response printing, and tool dispatch + done
- **Graceful exit on missing API key, exit 0 not exit 1.** "Missing API key" is a user-fixable configuration state, not an error condition. Exit 0 + hint reads as "here's what you need to do," not "something broke."
+detection. Phase 3 dynamic turn allocation will inject more state
- **One commit, not three.** The code changes and the doc changes are tightly coupled — splitting them creates a half-broken state in commit 1 where the code says one thing and the docs say another. The whole scope shift is one logical change.
+into the same code path, so the refactor had to land first.
-## Raw thinking
+I read the code carefully before designing the helpers. The cleanest
 split turned out to be three: `_build_dir_loop_context()` (pure
 setup, returns a `_DirLoopContext` namedtuple), `_flush_partial_dir_entry()`
 (idempotent partial-cache writer for the budget-exceeded path), and
 `_handle_turn_response()` (per-turn response processing — print,
 append, dispatch). The new `_run_dir_loop` body is ~25 lines.
- The fact that `ai.py` already did top-level `import anthropic` is interesting in hindsight. The lazy-deps story was load-bearing in the docs and the prompts but not really in the code. We were one CLI gate removal away from the dependency story being trivial, and we'd been paying the conceptual cost the whole time. Lesson: when a constraint feels heavier in docs than in code, check if the code is actually enforcing it.
+PR #67 shipped clean, 164 tests passed unchanged. Internals.md §4 was
- Watch mode being deleted feels right but is also a small loss of optionality. The next time someone says "I want to monitor this directory for changes," the answer is "luminos doesn't do that anymore, use `inotifywait`." That's fine for now — luminos was never the best churn monitor — but worth remembering if a real use case surfaces.
+updated to reflect the new structure and the file:line refs that
- The session was unusually fast (~45 min for a coordinated scope change spanning 11 files plus 5 wiki pages) because the user had already done the conceptual work in conversation. By the time I started cutting code, every decision had been pre-confirmed: option (b), delete watch, delete `--ai` cleanly, graceful exit. The TaskCreate breakdown (10 tasks) helped keep it linear, but the real speedup was that nothing was ambiguous when execution started.
+drifted.
 - Phase 3 prerequisites (#55, #56, #57) are now genuinely the only thing between us and Phase 3 proper. The scope change didn't touch any of them. Next session can pick one.
-## What's next
+### #56: tool registration consolidation
 The user explicitly chose "fix now" over "defer to Phase 3.5 when MCP
 will replace this anyway." Reasoning: easier to migrate one
 well-structured registry to MCP than to migrate two parallel
 structures.
 Before designing, I had to recognize a complication: the existing
 tool tables aren't a single registry. There are three scope lists
 (`_DIR_TOOLS`, `_SYNTHESIS_TOOLS`, `_SURVEY_TOOLS`) AND one global
 `_TOOL_DISPATCH` dict. Some tools (like `flag`) appear in multiple
 scopes with the same schema. Some tools (like `submit_report`) appear
 in multiple scopes with *different* schemas. And `submit_report` /
 `submit_survey` have schemas but no dispatch entry because the loop
 body intercepts them.
 Final design: a `register_tool(name, description, schema, scopes,
 handler=None)` function. Single source of truth per (tool, scope)
 pair. Tools in multiple scopes get multiple `register_tool()` calls
 to preserve order (otherwise the order in the second scope drifts
 relative to other tools).
 PR #68 was 399 insertions / 344 deletions. Runtime introspection
 confirmed identical scope contents and identical 10-entry dispatch
 table. 164 tests still passed unchanged. Internals.md §4.2 and §9.1
 shrunk: §9.1 went from a 5-step "don't forget the second half"
 process to a 4-step process with one obvious place to look.
 ### #55: pure-helper test coverage, wave 1
 The user said "No one likes doing tests but we need them." Picked the
 issue's seven targets and added one bonus from #57
 (`_flush_partial_dir_entry`).
 Used the `_make_manager()` pattern from `tests/test_cache.py` to
 construct a `_CacheManager` rooted in a tempdir, sidestepping
 `CACHE_ROOT` entirely. 45 tests across 8 helpers. One test had a
 typo in an asserted substring on the first run — the actual partial
 reason string is "context budget reached before files processed", not
 "before any files" — caught and fixed in 30 seconds. 209 total tests
 after PR #69.
 The two notable behaviors pinned: `_filter_dir_tools` threshold gate
 is strict `<` (the boundary case where confidence equals the
 threshold passes the gate), and `_path_is_safe` correctly rejects
 sibling-with-target-prefix (`/tmp/foo` vs `/tmp/foo_sibling` — the
 easy-to-miss path traversal case).
 ### #70: pure-helper test coverage, wave 2
 I noticed the wave 1 picks left out three high-impact helpers:
 `_TokenTracker`, `_synthesize_from_cache`, `_discover_directories`.
 Pitched them as "low effort, high impact." User agreed and asked me
 to file an issue, insert it into the roadmap before Phase 3, and
 ship.
 Reading `_TokenTracker` corrected my issue draft: I had written
 `reset_loop()` "preserves last_input" — actually it zeroes
 `last_input` along with the loop counters. The test pins the real
 behavior. I also discovered the `record()` method (not
 `record_usage()` as I'd written in the issue), and that
 `SimpleNamespace` works as a fake usage object because the function
 uses `getattr(usage, "input_tokens", 0)`.
 The load-bearing test in this batch is the budget-exceeded check
 under cumulative-input pressure: record 10 calls each with
 `input_tokens = CONTEXT_BUDGET // 5`, so total cumulative is 2x the
 budget but `last_input` stays at 1/5 of budget. Assert that
 `budget_exceeded()` returns False. This is exactly the #44 fix
 condition — if anyone regresses to "exceeded if cumulative > budget,"
 this test screams.
 `_synthesize_from_cache` only reads dir entries (not file entries) —
 worth pinning explicitly so a future maintainer doesn't add file
 entries thinking they should appear in the fallback report.
 `_discover_directories` tests now pin: leaves-first ordering, skip
 list (`.git`, `__pycache__`, `node_modules`, `*.egg-info`), custom
 exclude, hidden dirs by default, and the subtle `show_hidden=True`
 case where the skip list still applies (`.git` stays out even with
 hidden visible).
 PR #71 added 25 tests, 234 total. PLAN.md got restructured: new
 Phase 2.7 (#56 ✅) and Phase 2.8 (#55 ✅, #70) entries, the stale
 Phase 3.4 (#56) and "Background chore" (#55) sections deleted since
 they were displaced by the pre-Phase-3 cleanup pattern.
 ### Phase 3 prep recommendations and #72
 After the four pre-reqs were done, the user asked what else I'd
 recommend before starting Phase 3 ("phase 3 is a biggie and I want it
 to have a solid base"). I came back with three picks: end-to-end
 smoke test, design sketch for the planning pass, and document the
 leaf-first contract.
 User responded: smoke test already done externally (looks fine);
 design sketch deferred to Phase 3 task 1 (intent matched, timing
 disagreement); leaf-first contract — make it so.
 The leaf-first contract issue (#72) is wiki-only, no code. Added a
 new §4.7 to Internals.md explaining that `_discover_directories()`
 returns leaves-first as a load-bearing invariant, that
 `_get_child_summaries()` silently depends on it, and that the
 `(none — this is a leaf directory)` placeholder LIES if the children
 just haven't been investigated yet — the agent has no way to know.
 Two safe paths if Phase 3 changes the order: preserve leaf-first
 within priority bands, or rewrite the placeholder to be honest. First
 draft accidentally inserted §4.7 before §4.6 in the file; caught on
 re-read, swapped, committed.
 ## Key Decisions & Reasoning
 - **Scope shift went with option (b), not (a).** AI-only with the
  base scan purely internal. Reasoning: keeping `--no-ai` would have
  meant maintaining two CLI surfaces and two documentation paths for
  what the philosophy says is one product. Cleaner story.
 - **Delete watch mode rather than park it.** Parking would have
  required explaining in docs why one feature ignored the AI-first
  philosophy. PLAN.md already notes that watch comes back as
  incremental AI re-investigation if it comes back at all.
 - **Delete `--ai` cleanly, no deprecation.** Per global CLAUDE.md
  ("no backwards-compat shims when you can just change the code").
  Personal project, no external users to deprecate against.
 - **Graceful exit on missing API key, exit 0 not exit 1.** Missing
  key is a user-fixable configuration state, not an error. Exit 0 +
  hint reads as "here's what you need to do," not "something broke."
 - **Fix #56 now rather than defer to Phase 3.5.** User chose this
  explicitly. The structure introduced (one registry call per (tool,
  scope) pair) is naturally MCP-shaped, so the eventual MCP migration
  collapses to "replace `register_tool()` with a server call."
 - **Test coverage in two waves rather than one batch.** Wave 1 (#55)
  shipped first with the issue's stated targets. Then I noticed three
  more high-impact helpers were uncovered, pitched them, and the user
  greenlit a wave 2 (#70). Splitting kept each PR cohesive and
  reviewable.
 - **Phase 3 design sketch deferred to Phase 3 task 1.** I recommended
  it as Phase 3 prep. User overrode: "agree on intent, disagree on
  timing." Result: the design sketch is now bookkept as the first
  thing Phase 3 does, not as a separate prep cycle. Cleaner if Phase
  3 has the design fresh in mind when the rest of the work starts.
 - **One commit per PR for the scope change**, not split into code +
  docs commits. The two are tightly coupled — splitting would create
  a half-broken state in commit 1 where code says one thing and docs
  say another. Same logical change.
 ## Surprises & Discoveries
 - **The lazy-deps story was thinner than expected.** `ai.py` and
  `ast_parser.py` already did top-level imports of the AI packages.
  The "lazy" pattern lived only in the CLI gate. Removing the gate
  WAS the technical change for the entire scope shift. Lesson: when
  a constraint feels heavier in docs than in code, check whether the
  code is actually enforcing it.
 - **`capabilities.py` was almost dead weight.** Only `clear_cache()`
  was load-bearing, and even that only because of the `CACHE_ROOT`
  reference. We'd been paying a tax for the lazy-deps story that the
  code wasn't actually charging.
 - **`_TOOL_DISPATCH` and `_DIR_TOOLS` had a name collision case.**
  `submit_report` appears in both `_DIR_TOOLS` and `_SYNTHESIS_TOOLS`
  with different schemas. The new registry handles this with two
  `register_tool()` calls per scope, but the existence of the
  collision wasn't obvious until I read the code.
 - **`_TokenTracker.reset_loop()` zeroes `last_input`.** My #70 issue
  draft assumed it preserved `last_input` across resets. The actual
  code doesn't. Reading the code corrected the test plan before any
  test was wrong. Always read the code before writing the spec.
 - **`_synthesize_from_cache` reads dir entries only.** I had assumed
  it would also pull file entries in some "even more degraded" case.
  It doesn't. The fallback is dir-only or nothing.
 - **The graceful exit had to fire before the base scan, not after.**
  First draft put it after. Caught it in the writing stage, not in
  testing — but worth noting because the same pattern can sneak into
  other early-exit checks.
 ## Concerns & Open Threads
 - **Phase 3 design sketch is bookkept as Phase 3 task 1, not done
  yet.** This is the highest-priority unresolved thread. The
  planning pass touches many things (cache schema, dir loop
  orchestration, max_turns propagation, plan persistence, survey
  interaction, resume semantics, optional global token budget) and
  hand-rolling the design while implementing leads to drift. Make
  sure Phase 3 actually starts with the design sketch.
 - **The leaf-first contract is documented but only loosely
  enforced.** `TestDiscoverDirectories` pins the *ordering*, but
  there's no test that asserts "dirs are processed in the order they
  come out of `_discover_directories`" — the orchestrator could
  re-sort silently and the test wouldn't catch it. Phase 3 will
  introduce alternative orderings; this gap matters.
 - **Token budget arithmetic for Phase 3 is still a known unknown.**
  PLAN.md flags it: "How does the agent 'request more turns'?" The
  current `_TokenTracker` is per-loop with grand totals for cost.
  There's no concept of "we've spent X out of Y on this whole
  investigation." If Phase 3 dynamic turn allocation needs that,
  it has to grow it explicitly.
 - **No live integration smoke test from this session.** The user ran
  one externally and confirmed it works, but the assistant didn't
  observe it. If a regression slipped through, we'd find out at the
  start of Phase 3 or later. The unit tests are 234 strong but they
  don't cover the full pipeline end-to-end.
 - **Six PRs in one session is a lot of merge commits on main.** Not
  a problem per se, but if a regression bisects to "somewhere in
  Session 9" the bisect surface is wider than usual. Worth noting
  for the next session retro.
 - **Wiki-only changes (#72) work fine via direct commits to wiki
  main.** The pattern is established; future doc-only work can
  follow it without ceremony.
 ## Raw Thinking
 - The pre-Phase-3 cleanup pattern (#54 → #57 → #56 → #55 → #70 →
  #72) is worth naming as a paradigm: "pay debts in the area before
  adding new state to that area." Phase 2.6, 2.7, 2.8 in PLAN.md
  reflect this. Could be applied generally to any large milestone:
  inventory the helpers it'll touch, refactor + test them first,
  then add the new work on top of a known-good foundation.
 - The State-of-the-App summary at session start was useful framing.
  It surfaced which threads were on the table, which were blocked,
  and which had decision points pending. Worth doing more often,
  especially at the start of long sessions or sessions that start
  with "what's left."
 - `_TokenTracker` test count (11) was higher than I initially
  scoped. Once I started enumerating edge cases (boundary, defaults,
  multiple loops, reset semantics, the load-bearing #44 case) the
  count grew naturally. Good unit tests don't shrink. They accrete.
 - The `register_tool()` design is naturally MCP-shaped. A registry
  of `(name, schema, scopes, handler)` is exactly what an MCP tool
  list looks like. When Phase 3.5 lands, `register_tool()` can
  collapse to a one-line forward to the connected MCP server's
  `tools/list` response, and the migration touches almost nothing
  else. This was unintentional but lucky.
 - The session was unusually productive — 6 PRs, 5 issues filed,
  4 issues closed, 70 net new tests, 4 wiki page updates — because
  each piece of work unblocked the next and the user kept the
  decisions decisive. The TaskCreate breakdowns helped, but the
  real speedup was that nothing was ambiguous when execution
  started. When the user redirects with a single sentence ("fix
  now," "delete it," "make it so"), the loop doesn't have to stop
  to re-confirm.
 - "Documentation is work" — #72 was a quick experiment in shipping a
  doc-only issue with the same workflow as code. Worked fine.
  Pattern is repeatable for other cross-cutting concerns: contracts,
  invariants, design decisions that aren't enforced anywhere except
  in human heads.
 ## What's Next
 In priority order:
-1. **#57** — refactor `_run_dir_loop` before Phase 3 dynamic turn allocation lands. Prerequisite cleanup, small.
+1. **Phase 3 task 1: write the planning pass design sketch.**
-2. **#56** — dedupe `_TOOL_DISPATCH` / `_DIR_TOOLS` registration. Decision point: fix now or let it die in Phase 3.5 MCP migration.
+   Deferred from this session. ~30-45 minutes, no code. Cover the
-3. **#55** — unit test coverage for ai.py pure helpers. Foundation for Phase 3 confidence work.
+   `submit_plan` schema, plan storage in cache, `max_turns`
-4. Phase 3 proper (#19–#29 cluster).
+   propagation, skip-dir semantics, survey-output integration,
   resume semantics, and the optional global token budget question.
   Land in PLAN.md or a new wiki page before any Phase 3 code is
   cut.
 2. **Phase 3 implementation: #19–#29 cluster.** Planning pass after
   survey, before dir loops; `submit_plan` tool; dynamic turn
   allocation based on plan output; dir loop orchestrator updated
   to follow the plan. Multi-PR, probably multi-session.
 3. **Phase 3.5: MCP backend abstraction (#39).** The pivot point.
   After Phase 3 is working, before Phase 4. The `register_tool()`
   refactor from #56 makes this much easier than it would have been.
 4. **Phase 4+: external knowledge tools, scale-tiered synthesis,
   hypothesis-driven synthesis, refinement, dynamic report
   structure.** The full backlog from PLAN.md.
 When Phase 3 starts: re-read PLAN.md Part 4 (Investigation Planning)
 and Internals.md §4.7 (the leaf-first contract) before designing.
 The contract WILL be tempting to violate; the design sketch has to
 address it explicitly.
--- a/SessionRetrospectives.md
+++ b/SessionRetrospectives.md
@ -10,7 +10,7 @@
 | [Session 6](Session6) | 2026-04-07 | Extracted shared workflow/branching/protocols from project CLAUDE.md to global `~/.claude/CLAUDE.md`; moved externalize.md and wrap-up.md to `~/.claude/protocols/` |
 | [Session 7](Session7) | 2026-04-07 | Phase 1 audit (#1 closed, only #54 remains); gitea MCP credential overhaul — dedicated `claude-code` Forgejo user with admin on luminos, write+delete verified |
 | [Session 8](Session8) | 2026-04-07 | Closed #54 — added confidence/confidence_reason to write_cache tool schema description; Phase 1 milestone now 4/4 complete |
-| [Session 9](Session9) | 2026-04-11 | Scope shift (#64): AI investigation is the product, zero-dep constraint dropped, watch mode + capabilities.py deleted, requirements.txt added, README/CLAUDE/PLAN/wiki rewritten |
+| [Session 9](Session9) | 2026-04-11 | Scope shift (#64) + all Phase 3 prereqs: dir loop refactor (#57), tool registry consolidation (#56), pure-helper test coverage waves 1+2 (#55, #70), leaf-first contract docs (#72). 6 PRs, 70 new tests (164→234), Phase 2.6/2.7/2.8 milestones complete |
 ---