retro: Session 9 — scope shift + all Phase 3 prereqs shipped

2026-04-11 11:02:16 -06:00 · 2026-04-11 11:02:16 -06:00 · 31a052eca0
commit 31a052eca0
parent 508d66cba8
2 changed files with 334 additions and 46 deletions
--- a/Session9.md
+++ b/Session9.md
@ -1,65 +1,353 @@
-# Session 9
+# Session 9 Notes — 2026-04-11

-**Date:** 2026-04-11
-**Focus:** Scope shift — AI investigation is the product, drop zero-dependency constraint, delete watch mode (#64)
-**Duration estimate:** ~45 minutes
+## What We Set Out to Do

-## What was done
+Open question. The session started with a State-of-the-App summary
+request, which surfaced two threads: (1) a scope shift the user had
+been mulling, and (2) the existing Phase 3 prerequisites (#55, #56,
+#57) blocking Phase 3 proper. Neither was named as the goal up front
+— they emerged from the conversation and stacked.

-A coordinated scope change. Two original design constraints were dropped and one feature was deleted:
+What we shipped, in execution order:

-1. **Zero-dependency Python CLI is no longer a goal.** Luminos installs from `requirements.txt` like a normal Python project. `anthropic`, `tree-sitter` + grammars, and `python-magic` are normal pip dependencies, not lazy imports gated by a CLI flag.
-2. **AI investigation is the headline.** The base scan exists to feed the agent. There is no `--ai` flag and no `--no-ai` mode. AI runs unconditionally on every invocation.
-3. **Watch mode deleted.** A non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode comes back, it gets rebuilt as incremental AI re-investigation.
+1. #64 — AI investigation is the product, drop zero-dep constraint, delete watch mode
+2. #57 — Refactor `_run_dir_loop` into three focused helpers
+3. #56 — Single-source tool registration via `register_tool()`
+4. #55 — Unit test coverage for pure helpers in `ai.py` (wave 1)
+5. #70 — Test coverage wave 2: `_TokenTracker`, `_synthesize_from_cache`, `_discover_directories`
+6. #72 — Document the leaf-first investigation contract in Internals.md

-### Code
+Six issues, six PRs (or wiki commits), 234 tests, four open issues
+closed plus the new ones, all in one continuous session.

- Deleted `luminos_lib/watch.py` and the `--watch` flag.
- Deleted `luminos_lib/capabilities.py` and `tests/test_capabilities.py`. Moved `clear_cache()` into `cache.py`.
- `luminos.py`: removed `--watch`, `--ai`, `--install-extras`. Kept `--clear-cache`, `--fresh`, `-x`, `-d`, `-a`, `-o`, `--json`. AI runs unconditionally after the base scan. If `ANTHROPIC_API_KEY` is unset, exits 0 with a one-line hint *before* running the base scan.
- `ai.py`: dropped the `check_ai_dependencies()` call and the import.
- New `requirements.txt`. `setup_env.sh` installs from it.
+## What Actually Happened

-### Docs
+### Scope shift (#64)

- `README.md` rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line.
- `CLAUDE.md` (project): rewrites Key Constraints, updates module map and Running Luminos commands.
- `PLAN.md`: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature.
- Wiki: `Architecture.md`, `DevelopmentGuide.md`, `Home.md`, `Internals.md` updated.
+The user opened with "I would like to make a couple scope changes" —
+drop the zero-dep constraint, make AI investigation the main show. We
+worked through the framing in conversation: option (a) AI-default
+with `--no-ai` escape hatch, or option (b) AI-only with the base scan
+purely internal. User picked (b). Watch mode was deleted as part of
+the same change because a non-AI churn monitor conflicts with the new
+philosophy.

-### Ship
+Reading the code first turned up two things that made the change much
+smaller than expected. First, `ai.py` and `ast_parser.py` already did
+top-level imports of `anthropic`/`magic`/`tree_sitter` — the "lazy
+deps" pattern lived only in `luminos.py`'s `if args.ai:` gate.
+Removing that gate WAS the entire technical change. Second,
+`capabilities.py` was almost dead weight: only `clear_cache()` was
+load-bearing, and only because it knew about `CACHE_ROOT` (which
+already lived in `cache.py`). One import move and the whole module
+could be deleted.

- PR #65 merged via gitea MCP, branch deleted local + remote.
- Issue #64 closed manually (not relying on `Closes #N`).
- Issue #35 (incremental AI re-investigation in watch mode) closed as obsolete with a comment explaining why the framing no longer fits.
- Tests: 164 pass (down from 168 with the 4 removed capabilities tests).
+PR #65 landed 11 file changes: deleted `watch.py`, deleted
+`capabilities.py`, deleted `tests/test_capabilities.py`, moved
+`clear_cache()` into `cache.py`, rewrote `luminos.py` to make AI
+mandatory, dropped `check_ai_dependencies()` from `ai.py`, added
+`requirements.txt`, updated `setup_env.sh`, and rewrote `README.md` /
+`CLAUDE.md` / `PLAN.md` to match. Wiki updates landed in a separate
+commit on the wiki repo. PR #66 was the matching session-log bump.

-## Discoveries and observations
+The graceful exit case mattered enough to call out: first draft
+checked `ANTHROPIC_API_KEY` *after* the base scan ran. That makes the
+user wait through a multi-second scan only to be told they can't use
+the result. Moved to top of `main()`, after target validation but
+before `scan()`. Verified by running `unset ANTHROPIC_API_KEY &&
+python3 luminos.py /tmp` and observing a clean exit 0 with the hint.

- **The "lazy import" pattern was thinner than expected.** `ai.py` and `ast_parser.py` already did top-level imports of `anthropic`/`magic`/`tree_sitter`. The "lazy" behavior lived only in `luminos.py`'s `if args.ai:` gate. So the actual code change for "drop the zero-dep constraint" was much smaller than the conceptual shift suggested: just remove the gate and accept that `luminos.py`'s AI import always fires.
- **`capabilities.py` was almost dead weight.** Only `clear_cache()` was load-bearing, and even that only because it knew about `CACHE_ROOT` (which already lives in `cache.py`). Moving `clear_cache()` to `cache.py` was a one-import simplification — `capabilities.py` was a tax we'd been paying for the lazy-deps story.
- **The graceful exit needed to fire *before* the base scan, not after.** First draft put the `ANTHROPIC_API_KEY` check after the scan ran. That makes the user wait through a multi-second scan only to get told they can't actually use the result. Moved it to the top of `main()` after target validation but before `scan()`.
+### #57: dir loop refactor

-## Decisions made and why
+After the scope change shipped, the natural next move was the Phase 3
+prerequisites. Picked #57 first because it was the structural one
+that everything else benefits from.

- **Option (b), not (a).** The choice was between "AI-default with `--no-ai` escape hatch" (a) and "AI-only, base scan is internal" (b). Picked (b). Reasoning: keeping `--no-ai` would have meant maintaining two CLI surfaces and two documentation paths for what the philosophy says is one product. The base scan is still useful internally (it produces the `report` dict the agent reads), it just doesn't need to be exposed as a standalone CLI mode. Cleaner story, less drift.
- **Delete watch mode rather than park it.** It was ~110 lines, no tests, no users in our context. Parking it as a "scoped-down churn monitor" would have meant explaining in docs why one feature ignores the AI-first philosophy. Delete + a clear note in PLAN.md that watch comes back as incremental AI re-investigation if it comes back at all.
- **Delete `--ai` cleanly, no deprecation cycle.** Per global CLAUDE.md ("don't use feature flags or backwards-compatibility shims when you can just change the code"). It's a personal project with no external users to deprecate against.
- **Graceful exit on missing API key, exit 0 not exit 1.** "Missing API key" is a user-fixable configuration state, not an error condition. Exit 0 + hint reads as "here's what you need to do," not "something broke."
- **One commit, not three.** The code changes and the doc changes are tightly coupled — splitting them creates a half-broken state in commit 1 where the code says one thing and the docs say another. The whole scope shift is one logical change.
+`_run_dir_loop` was ~160 lines holding four conceptual layers:
+pre-loop setup, budget check + partial-flush (~57 lines, the largest
+single block), API call + response printing, and tool dispatch + done
+detection. Phase 3 dynamic turn allocation will inject more state
+into the same code path, so the refactor had to land first.

-## Raw thinking
+I read the code carefully before designing the helpers. The cleanest
+split turned out to be three: `_build_dir_loop_context()` (pure
+setup, returns a `_DirLoopContext` namedtuple), `_flush_partial_dir_entry()`
+(idempotent partial-cache writer for the budget-exceeded path), and
+`_handle_turn_response()` (per-turn response processing — print,
+append, dispatch). The new `_run_dir_loop` body is ~25 lines.

- The fact that `ai.py` already did top-level `import anthropic` is interesting in hindsight. The lazy-deps story was load-bearing in the docs and the prompts but not really in the code. We were one CLI gate removal away from the dependency story being trivial, and we'd been paying the conceptual cost the whole time. Lesson: when a constraint feels heavier in docs than in code, check if the code is actually enforcing it.
- Watch mode being deleted feels right but is also a small loss of optionality. The next time someone says "I want to monitor this directory for changes," the answer is "luminos doesn't do that anymore, use `inotifywait`." That's fine for now — luminos was never the best churn monitor — but worth remembering if a real use case surfaces.
- The session was unusually fast (~45 min for a coordinated scope change spanning 11 files plus 5 wiki pages) because the user had already done the conceptual work in conversation. By the time I started cutting code, every decision had been pre-confirmed: option (b), delete watch, delete `--ai` cleanly, graceful exit. The TaskCreate breakdown (10 tasks) helped keep it linear, but the real speedup was that nothing was ambiguous when execution started.
- Phase 3 prerequisites (#55, #56, #57) are now genuinely the only thing between us and Phase 3 proper. The scope change didn't touch any of them. Next session can pick one.
+PR #67 shipped clean, 164 tests passed unchanged. Internals.md §4 was
+updated to reflect the new structure and the file:line refs that
+drifted.

-## What's next
+### #56: tool registration consolidation
+
+The user explicitly chose "fix now" over "defer to Phase 3.5 when MCP
+will replace this anyway." Reasoning: easier to migrate one
+well-structured registry to MCP than to migrate two parallel
+structures.
+
+Before designing, I had to recognize a complication: the existing
+tool tables aren't a single registry. There are three scope lists
+(`_DIR_TOOLS`, `_SYNTHESIS_TOOLS`, `_SURVEY_TOOLS`) AND one global
+`_TOOL_DISPATCH` dict. Some tools (like `flag`) appear in multiple
+scopes with the same schema. Some tools (like `submit_report`) appear
+in multiple scopes with *different* schemas. And `submit_report` /
+`submit_survey` have schemas but no dispatch entry because the loop
+body intercepts them.
+
+Final design: a `register_tool(name, description, schema, scopes,
+handler=None)` function. Single source of truth per (tool, scope)
+pair. Tools in multiple scopes get multiple `register_tool()` calls
+to preserve order (otherwise the order in the second scope drifts
+relative to other tools).
+
+PR #68 was 399 insertions / 344 deletions. Runtime introspection
+confirmed identical scope contents and identical 10-entry dispatch
+table. 164 tests still passed unchanged. Internals.md §4.2 and §9.1
+shrunk: §9.1 went from a 5-step "don't forget the second half"
+process to a 4-step process with one obvious place to look.
+
+### #55: pure-helper test coverage, wave 1
+
+The user said "No one likes doing tests but we need them." Picked the
+issue's seven targets and added one bonus from #57
+(`_flush_partial_dir_entry`).
+
+Used the `_make_manager()` pattern from `tests/test_cache.py` to
+construct a `_CacheManager` rooted in a tempdir, sidestepping
+`CACHE_ROOT` entirely. 45 tests across 8 helpers. One test had a
+typo in an asserted substring on the first run — the actual partial
+reason string is "context budget reached before files processed", not
+"before any files" — caught and fixed in 30 seconds. 209 total tests
+after PR #69.
+
+The two notable behaviors pinned: `_filter_dir_tools` threshold gate
+is strict `<` (the boundary case where confidence equals the
+threshold passes the gate), and `_path_is_safe` correctly rejects
+sibling-with-target-prefix (`/tmp/foo` vs `/tmp/foo_sibling` — the
+easy-to-miss path traversal case).
+
+### #70: pure-helper test coverage, wave 2
+
+I noticed the wave 1 picks left out three high-impact helpers:
+`_TokenTracker`, `_synthesize_from_cache`, `_discover_directories`.
+Pitched them as "low effort, high impact." User agreed and asked me
+to file an issue, insert it into the roadmap before Phase 3, and
+ship.
+
+Reading `_TokenTracker` corrected my issue draft: I had written
+`reset_loop()` "preserves last_input" — actually it zeroes
+`last_input` along with the loop counters. The test pins the real
+behavior. I also discovered the `record()` method (not
+`record_usage()` as I'd written in the issue), and that
+`SimpleNamespace` works as a fake usage object because the function
+uses `getattr(usage, "input_tokens", 0)`.
+
+The load-bearing test in this batch is the budget-exceeded check
+under cumulative-input pressure: record 10 calls each with
+`input_tokens = CONTEXT_BUDGET // 5`, so total cumulative is 2x the
+budget but `last_input` stays at 1/5 of budget. Assert that
+`budget_exceeded()` returns False. This is exactly the #44 fix
+condition — if anyone regresses to "exceeded if cumulative > budget,"
+this test screams.
+
+`_synthesize_from_cache` only reads dir entries (not file entries) —
+worth pinning explicitly so a future maintainer doesn't add file
+entries thinking they should appear in the fallback report.
+
+`_discover_directories` tests now pin: leaves-first ordering, skip
+list (`.git`, `__pycache__`, `node_modules`, `*.egg-info`), custom
+exclude, hidden dirs by default, and the subtle `show_hidden=True`
+case where the skip list still applies (`.git` stays out even with
+hidden visible).
+
+PR #71 added 25 tests, 234 total. PLAN.md got restructured: new
+Phase 2.7 (#56 ✅) and Phase 2.8 (#55 ✅, #70) entries, the stale
+Phase 3.4 (#56) and "Background chore" (#55) sections deleted since
+they were displaced by the pre-Phase-3 cleanup pattern.
+
+### Phase 3 prep recommendations and #72
+
+After the four pre-reqs were done, the user asked what else I'd
+recommend before starting Phase 3 ("phase 3 is a biggie and I want it
+to have a solid base"). I came back with three picks: end-to-end
+smoke test, design sketch for the planning pass, and document the
+leaf-first contract.
+
+User responded: smoke test already done externally (looks fine);
+design sketch deferred to Phase 3 task 1 (intent matched, timing
+disagreement); leaf-first contract — make it so.
+
+The leaf-first contract issue (#72) is wiki-only, no code. Added a
+new §4.7 to Internals.md explaining that `_discover_directories()`
+returns leaves-first as a load-bearing invariant, that
+`_get_child_summaries()` silently depends on it, and that the
+`(none — this is a leaf directory)` placeholder LIES if the children
+just haven't been investigated yet — the agent has no way to know.
+Two safe paths if Phase 3 changes the order: preserve leaf-first
+within priority bands, or rewrite the placeholder to be honest. First
+draft accidentally inserted §4.7 before §4.6 in the file; caught on
+re-read, swapped, committed.
+
+## Key Decisions & Reasoning
+
+- **Scope shift went with option (b), not (a).** AI-only with the
+  base scan purely internal. Reasoning: keeping `--no-ai` would have
+  meant maintaining two CLI surfaces and two documentation paths for
+  what the philosophy says is one product. Cleaner story.
+- **Delete watch mode rather than park it.** Parking would have
+  required explaining in docs why one feature ignored the AI-first
+  philosophy. PLAN.md already notes that watch comes back as
+  incremental AI re-investigation if it comes back at all.
+- **Delete `--ai` cleanly, no deprecation.** Per global CLAUDE.md
+  ("no backwards-compat shims when you can just change the code").
+  Personal project, no external users to deprecate against.
+- **Graceful exit on missing API key, exit 0 not exit 1.** Missing
+  key is a user-fixable configuration state, not an error. Exit 0 +
+  hint reads as "here's what you need to do," not "something broke."
+- **Fix #56 now rather than defer to Phase 3.5.** User chose this
+  explicitly. The structure introduced (one registry call per (tool,
+  scope) pair) is naturally MCP-shaped, so the eventual MCP migration
+  collapses to "replace `register_tool()` with a server call."
+- **Test coverage in two waves rather than one batch.** Wave 1 (#55)
+  shipped first with the issue's stated targets. Then I noticed three
+  more high-impact helpers were uncovered, pitched them, and the user
+  greenlit a wave 2 (#70). Splitting kept each PR cohesive and
+  reviewable.
+- **Phase 3 design sketch deferred to Phase 3 task 1.** I recommended
+  it as Phase 3 prep. User overrode: "agree on intent, disagree on
+  timing." Result: the design sketch is now bookkept as the first
+  thing Phase 3 does, not as a separate prep cycle. Cleaner if Phase
+  3 has the design fresh in mind when the rest of the work starts.
+- **One commit per PR for the scope change**, not split into code +
+  docs commits. The two are tightly coupled — splitting would create
+  a half-broken state in commit 1 where code says one thing and docs
+  say another. Same logical change.
+
+## Surprises & Discoveries
+
+- **The lazy-deps story was thinner than expected.** `ai.py` and
+  `ast_parser.py` already did top-level imports of the AI packages.
+  The "lazy" pattern lived only in the CLI gate. Removing the gate
+  WAS the technical change for the entire scope shift. Lesson: when
+  a constraint feels heavier in docs than in code, check whether the
+  code is actually enforcing it.
+- **`capabilities.py` was almost dead weight.** Only `clear_cache()`
+  was load-bearing, and even that only because of the `CACHE_ROOT`
+  reference. We'd been paying a tax for the lazy-deps story that the
+  code wasn't actually charging.
+- **`_TOOL_DISPATCH` and `_DIR_TOOLS` had a name collision case.**
+  `submit_report` appears in both `_DIR_TOOLS` and `_SYNTHESIS_TOOLS`
+  with different schemas. The new registry handles this with two
+  `register_tool()` calls per scope, but the existence of the
+  collision wasn't obvious until I read the code.
+- **`_TokenTracker.reset_loop()` zeroes `last_input`.** My #70 issue
+  draft assumed it preserved `last_input` across resets. The actual
+  code doesn't. Reading the code corrected the test plan before any
+  test was wrong. Always read the code before writing the spec.
+- **`_synthesize_from_cache` reads dir entries only.** I had assumed
+  it would also pull file entries in some "even more degraded" case.
+  It doesn't. The fallback is dir-only or nothing.
+- **The graceful exit had to fire before the base scan, not after.**
+  First draft put it after. Caught it in the writing stage, not in
+  testing — but worth noting because the same pattern can sneak into
+  other early-exit checks.
+
+## Concerns & Open Threads
+
+- **Phase 3 design sketch is bookkept as Phase 3 task 1, not done
+  yet.** This is the highest-priority unresolved thread. The
+  planning pass touches many things (cache schema, dir loop
+  orchestration, max_turns propagation, plan persistence, survey
+  interaction, resume semantics, optional global token budget) and
+  hand-rolling the design while implementing leads to drift. Make
+  sure Phase 3 actually starts with the design sketch.
+- **The leaf-first contract is documented but only loosely
+  enforced.** `TestDiscoverDirectories` pins the *ordering*, but
+  there's no test that asserts "dirs are processed in the order they
+  come out of `_discover_directories`" — the orchestrator could
+  re-sort silently and the test wouldn't catch it. Phase 3 will
+  introduce alternative orderings; this gap matters.
+- **Token budget arithmetic for Phase 3 is still a known unknown.**
+  PLAN.md flags it: "How does the agent 'request more turns'?" The
+  current `_TokenTracker` is per-loop with grand totals for cost.
+  There's no concept of "we've spent X out of Y on this whole
+  investigation." If Phase 3 dynamic turn allocation needs that,
+  it has to grow it explicitly.
+- **No live integration smoke test from this session.** The user ran
+  one externally and confirmed it works, but the assistant didn't
+  observe it. If a regression slipped through, we'd find out at the
+  start of Phase 3 or later. The unit tests are 234 strong but they
+  don't cover the full pipeline end-to-end.
+- **Six PRs in one session is a lot of merge commits on main.** Not
+  a problem per se, but if a regression bisects to "somewhere in
+  Session 9" the bisect surface is wider than usual. Worth noting
+  for the next session retro.
+- **Wiki-only changes (#72) work fine via direct commits to wiki
+  main.** The pattern is established; future doc-only work can
+  follow it without ceremony.
+
+## Raw Thinking
+
+- The pre-Phase-3 cleanup pattern (#54 → #57 → #56 → #55 → #70 →
+  #72) is worth naming as a paradigm: "pay debts in the area before
+  adding new state to that area." Phase 2.6, 2.7, 2.8 in PLAN.md
+  reflect this. Could be applied generally to any large milestone:
+  inventory the helpers it'll touch, refactor + test them first,
+  then add the new work on top of a known-good foundation.
+- The State-of-the-App summary at session start was useful framing.
+  It surfaced which threads were on the table, which were blocked,
+  and which had decision points pending. Worth doing more often,
+  especially at the start of long sessions or sessions that start
+  with "what's left."
+- `_TokenTracker` test count (11) was higher than I initially
+  scoped. Once I started enumerating edge cases (boundary, defaults,
+  multiple loops, reset semantics, the load-bearing #44 case) the
+  count grew naturally. Good unit tests don't shrink. They accrete.
+- The `register_tool()` design is naturally MCP-shaped. A registry
+  of `(name, schema, scopes, handler)` is exactly what an MCP tool
+  list looks like. When Phase 3.5 lands, `register_tool()` can
+  collapse to a one-line forward to the connected MCP server's
+  `tools/list` response, and the migration touches almost nothing
+  else. This was unintentional but lucky.
+- The session was unusually productive — 6 PRs, 5 issues filed,
+  4 issues closed, 70 net new tests, 4 wiki page updates — because
+  each piece of work unblocked the next and the user kept the
+  decisions decisive. The TaskCreate breakdowns helped, but the
+  real speedup was that nothing was ambiguous when execution
+  started. When the user redirects with a single sentence ("fix
+  now," "delete it," "make it so"), the loop doesn't have to stop
+  to re-confirm.
+- "Documentation is work" — #72 was a quick experiment in shipping a
+  doc-only issue with the same workflow as code. Worked fine.
+  Pattern is repeatable for other cross-cutting concerns: contracts,
+  invariants, design decisions that aren't enforced anywhere except
+  in human heads.
+
+## What's Next

 In priority order:

-1. **#57** — refactor `_run_dir_loop` before Phase 3 dynamic turn allocation lands. Prerequisite cleanup, small.
-2. **#56** — dedupe `_TOOL_DISPATCH` / `_DIR_TOOLS` registration. Decision point: fix now or let it die in Phase 3.5 MCP migration.
-3. **#55** — unit test coverage for ai.py pure helpers. Foundation for Phase 3 confidence work.
-4. Phase 3 proper (#19–#29 cluster).
+1. **Phase 3 task 1: write the planning pass design sketch.**
+   Deferred from this session. ~30-45 minutes, no code. Cover the
+   `submit_plan` schema, plan storage in cache, `max_turns`
+   propagation, skip-dir semantics, survey-output integration,
+   resume semantics, and the optional global token budget question.
+   Land in PLAN.md or a new wiki page before any Phase 3 code is
+   cut.
+2. **Phase 3 implementation: #19–#29 cluster.** Planning pass after
+   survey, before dir loops; `submit_plan` tool; dynamic turn
+   allocation based on plan output; dir loop orchestrator updated
+   to follow the plan. Multi-PR, probably multi-session.
+3. **Phase 3.5: MCP backend abstraction (#39).** The pivot point.
+   After Phase 3 is working, before Phase 4. The `register_tool()`
+   refactor from #56 makes this much easier than it would have been.
+4. **Phase 4+: external knowledge tools, scale-tiered synthesis,
+   hypothesis-driven synthesis, refinement, dynamic report
+   structure.** The full backlog from PLAN.md.
+
+When Phase 3 starts: re-read PLAN.md Part 4 (Investigation Planning)
+and Internals.md §4.7 (the leaf-first contract) before designing.
+The contract WILL be tempting to violate; the design sketch has to
+address it explicitly.
--- a/SessionRetrospectives.md
+++ b/SessionRetrospectives.md
@ -10,7 +10,7 @@
 | [Session 6](Session6) | 2026-04-07 | Extracted shared workflow/branching/protocols from project CLAUDE.md to global `~/.claude/CLAUDE.md`; moved externalize.md and wrap-up.md to `~/.claude/protocols/` |
 | [Session 7](Session7) | 2026-04-07 | Phase 1 audit (#1 closed, only #54 remains); gitea MCP credential overhaul — dedicated `claude-code` Forgejo user with admin on luminos, write+delete verified |
 | [Session 8](Session8) | 2026-04-07 | Closed #54 — added confidence/confidence_reason to write_cache tool schema description; Phase 1 milestone now 4/4 complete |
-| [Session 9](Session9) | 2026-04-11 | Scope shift (#64): AI investigation is the product, zero-dep constraint dropped, watch mode + capabilities.py deleted, requirements.txt added, README/CLAUDE/PLAN/wiki rewritten |
+| [Session 9](Session9) | 2026-04-11 | Scope shift (#64) + all Phase 3 prereqs: dir loop refactor (#57), tool registry consolidation (#56), pure-helper test coverage waves 1+2 (#55, #70), leaf-first contract docs (#72). 6 PRs, 70 new tests (164→234), Phase 2.6/2.7/2.8 milestones complete |

 ---