This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Session 3
Date: 2026-04-06 Focus: Phase 1 completion, architectural tangent (MCP backends), documentation
What Was Done
Phase 1: Confidence Tracking — completed
All three Phase 1 issues shipped and closed:
- #1 (confidence fields in cache schemas) — already done at session start
- #2 (update dir loop prompt) — added
confidenceandconfidence_reasonto both cache schemas in_DIR_SYSTEM_PROMPT. Added a## Confidencesection with categorical guidance (high ≥ 0.8, medium 0.5–0.8, low < 0.5) and the rule to includeconfidence_reasonwhen below 0.7. Commit:feat(prompts): instruct agent to set confidence on cache writes - #3 (
low_confidence_entries()) — added method to_CacheManagerthat returns all file+dir cache entries below a confidence threshold (default 0.7). Entries missing a confidence field are included as unrated. Results sorted ascending. 7 new tests added. All 136 tests pass. Commit:feat(cache): add low_confidence_entries() query to CacheManager
New issues opened
- #38 — Cache invalidation based on file mtime. The cache is keyed purely on path with no staleness detection. Re-runs silently use stale entries. Fix: store mtime at write time, compare on re-run, invalidate if file is newer.
- #39 — Phase 3.5: Migrate to MCP backend architecture. Full design captured.
- #40 — Review and update Phase 4+ issues after MCP pivot lands (primarily Phase 4 external knowledge tool issues need rewriting as MCP servers).
PLAN.md updated
Added Part 10: MCP Backend Abstraction — full design for migrating luminos into an MCP client/server model. Added Phase 3.5 to the implementation order between Phase 3 and Phase 4.
Discoveries and Observations
- The cache has no invalidation logic at all.
has_entry()is purely path-based.cached_atis stored but never read for comparison.--freshis the only escape. This is more of a gap than initially obvious — documented as #38. - The existing
read_all_entries()method madelow_confidence_entries()trivial to implement — the query layer composed naturally on top of the read layer.
Decisions Made
MCP pivot timing: after Phase 3, before Phase 4.
Discussed at length. Rationale:
- After Phase 3, survey + planning + dir loops + synthesis are all working with filesystem assumptions baked in. Enough surface area to make the migration genuinely instructive.
- Phase 4 external tools (web_search, fetch_url, package_lookup) are naturally MCP servers — implementing them before the pivot would mean doing them twice.
- The project's primary goal is learning agentic AI. Migrating working code into an MCP architecture is a valuable lesson in itself — the migration pain is intentional.
Confidence field: include missing entries in low_confidence_entries().
Entries written before confidence tracking existed have no confidence field. Treating them as unrated (confidence=0.0) means they surface in refinement pass queries rather than being silently trusted. Safer default.
Raw Thinking
The MCP tangent surfaced a real architectural tension: luminos is currently a tightly coupled pipeline where filesystem assumptions are woven throughout ai.py, the prompts, and the tool dispatch. The investigation loop logic (survey → plan → investigate → synthesize) is genuinely generic, but it's not expressed that way in the code. The MCP pivot will force that separation to become explicit — which is the whole point.
The "tree assumption" is the most load-bearing thing to watch during the pivot. Every part of the investigation loop assumes hierarchical containers (leaf-first traversal, child summaries injected upward, synthesis over dir summaries). Non-filesystem backends that aren't clean trees will expose this. The filesystem MCP server can just wrap the existing logic, but the second backend built will immediately hit this constraint.
The distinction between backend tools (read_file, list_dir, parse_structure — move to MCP) and control tools (write_cache, submit_report, submit_plan — stay in luminos) is the key design invariant to preserve during Phase 3.5. If control tools leak into MCP servers, the investigation loop loses its ability to observe and record findings.
Confidence calibration concern from PLAN.md is real — LLMs tend to be overconfident. The categorical guidance in the prompt (high/medium/low with numeric anchors) is a reasonable first attempt, but it's worth watching whether the agent actually produces a meaningful distribution of confidence scores in practice or whether everything comes back 0.85.
What's Next
Phase 2: Survey Pass — four issues ready to implement in order:
- #4 Add
_SURVEY_SYSTEM_PROMPTtoprompts.py - #5 Implement
_run_survey()andsubmit_surveytool inai.py - #6 Wire survey output into dir loop system prompt
- #7 Skip survey pass for targets below minimum size threshold
Start with #4 — prompt work only, no AI logic, low risk.