Table of Contents

Session 3

What Was Done

Phase 1: Confidence Tracking — completed
New issues opened
PLAN.md updated

Discoveries and Observations
Decisions Made
Raw Thinking
What's Next

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Session 3

Date: 2026-04-06 Focus: Phase 1 completion, architectural tangent (MCP backends), documentation

What Was Done

Phase 1: Confidence Tracking — completed

All three Phase 1 issues shipped and closed:

#1 (confidence fields in cache schemas) — already done at session start
#2 (update dir loop prompt) — added confidence and confidence_reason to both cache schemas in _DIR_SYSTEM_PROMPT. Added a ## Confidence section with categorical guidance (high ≥ 0.8, medium 0.5–0.8, low < 0.5) and the rule to include confidence_reason when below 0.7. Commit: feat(prompts): instruct agent to set confidence on cache writes
#3 (low_confidence_entries()) — added method to _CacheManager that returns all file+dir cache entries below a confidence threshold (default 0.7). Entries missing a confidence field are included as unrated. Results sorted ascending. 7 new tests added. All 136 tests pass. Commit: feat(cache): add low_confidence_entries() query to CacheManager

New issues opened

#38 — Cache invalidation based on file mtime. The cache is keyed purely on path with no staleness detection. Re-runs silently use stale entries. Fix: store mtime at write time, compare on re-run, invalidate if file is newer.
#39 — Phase 3.5: Migrate to MCP backend architecture. Full design captured.
#40 — Review and update Phase 4+ issues after MCP pivot lands (primarily Phase 4 external knowledge tool issues need rewriting as MCP servers).

PLAN.md updated

Added Part 10: MCP Backend Abstraction — full design for migrating luminos into an MCP client/server model. Added Phase 3.5 to the implementation order between Phase 3 and Phase 4.

Discoveries and Observations

The cache has no invalidation logic at all. has_entry() is purely path-based. cached_at is stored but never read for comparison. --fresh is the only escape. This is more of a gap than initially obvious — documented as #38.
The existing read_all_entries() method made low_confidence_entries() trivial to implement — the query layer composed naturally on top of the read layer.

Decisions Made

MCP pivot timing: after Phase 3, before Phase 4.

Discussed at length. Rationale:

After Phase 3, survey + planning + dir loops + synthesis are all working with filesystem assumptions baked in. Enough surface area to make the migration genuinely instructive.
Phase 4 external tools (web_search, fetch_url, package_lookup) are naturally MCP servers — implementing them before the pivot would mean doing them twice.
The project's primary goal is learning agentic AI. Migrating working code into an MCP architecture is a valuable lesson in itself — the migration pain is intentional.

Confidence field: include missing entries in low_confidence_entries().

Entries written before confidence tracking existed have no confidence field. Treating them as unrated (confidence=0.0) means they surface in refinement pass queries rather than being silently trusted. Safer default.

Raw Thinking

The MCP tangent surfaced a real architectural tension: luminos is currently a tightly coupled pipeline where filesystem assumptions are woven throughout ai.py, the prompts, and the tool dispatch. The investigation loop logic (survey → plan → investigate → synthesize) is genuinely generic, but it's not expressed that way in the code. The MCP pivot will force that separation to become explicit — which is the whole point.

The "tree assumption" is the most load-bearing thing to watch during the pivot. Every part of the investigation loop assumes hierarchical containers (leaf-first traversal, child summaries injected upward, synthesis over dir summaries). Non-filesystem backends that aren't clean trees will expose this. The filesystem MCP server can just wrap the existing logic, but the second backend built will immediately hit this constraint.

The distinction between backend tools (read_file, list_dir, parse_structure — move to MCP) and control tools (write_cache, submit_report, submit_plan — stay in luminos) is the key design invariant to preserve during Phase 3.5. If control tools leak into MCP servers, the investigation loop loses its ability to observe and record findings.

Confidence calibration concern from PLAN.md is real — LLMs tend to be overconfident. The categorical guidance in the prompt (high/medium/low with numeric anchors) is a reasonable first attempt, but it's worth watching whether the agent actually produces a meaningful distribution of confidence scores in practice or whether everything comes back 0.85.

What's Next

Phase 2: Survey Pass — four issues ready to implement in order:

#4 Add _SURVEY_SYSTEM_PROMPT to prompts.py
#5 Implement _run_survey() and submit_survey tool in ai.py
#6 Wire survey output into dir loop system prompt
#7 Skip survey pass for targets below minimum size threshold

Start with #4 — prompt work only, no AI logic, low risk.