From 147dc5bb884a80fd7f83df0b3fb94937592486dc Mon Sep 17 00:00:00 2001 From: Jeff Smith Date: Wed, 8 Apr 2026 14:02:43 -0600 Subject: [PATCH] =?UTF-8?q?retro:=20Session=201=20=E2=80=94=20project=20cr?= =?UTF-8?q?eation,=20contract=20design,=20Phase=200=20complete?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Haiku 4.5 --- Session1.md | 95 ++++++++++++++++++++++++++++++++++++++++ SessionRetrospectives.md | 7 +++ 2 files changed, 102 insertions(+) create mode 100644 Session1.md create mode 100644 SessionRetrospectives.md diff --git a/Session1.md b/Session1.md new file mode 100644 index 0000000..28f15d4 --- /dev/null +++ b/Session1.md @@ -0,0 +1,95 @@ +# Session 1 Notes — 2026-04-08 + +## What We Set Out to Do +Create the Marchwarden project from scratch: name it, set up the repo, document the architecture and research contract, plan the development roadmap, and start Phase 0 implementation. + +## What Actually Happened + +### Naming (longer than expected, worth it) +Spent significant time finding the right name. Explored Latin (vestigia, sodalis, auspex), Greek (heuresis, gnomon, scholia, epoche), Arabic (isnad, rihla, ijtihad, tahqiq), and English compounds (marchwarden, lanternwake). Deep-dived gnomon vs rihla before landing on **marchwarden** — a guardian at the frontier of knowledge. + +The naming process revealed the user's style: evocative single-word names (harbormind, luminos), latinate/compound, not literal-descriptive. + +### Repo and scaffolding +- Created repo at `archeious/marchwarden` (initially created under `claude-code` by mistake, migrated) +- Set up directory structure: `researchers/web/`, `orchestrator/`, `cli/`, `docs/wiki/` +- Wiki pages: Architecture, ResearchContract, DevelopmentGuide, Roadmap +- Issue #1 tracks V1 scope + +### Contract evolution (the real work of this session) +The contract went through three revisions driven by architectural critique: + +1. **Initial:** Simple `answer + citations + gaps + confidence + cost_metadata + trace_id` +2. **Post-critique:** Added `raw_excerpt` (synthesis paradox fix), `discovery_events` (lateral metadata), categorized `gaps` (GapCategory enum), `confidence_factors` (auditable scoring) +3. **Final:** Added `content_hash` in traces (pseudo-CAS), `model_id` in CostMetadata (fidelity-to-cost analysis) + +The user brought in external critique (Gemini analysis) which pushed the contract to higher fidelity. Good pattern — external review caught real architectural weaknesses. + +### Phase 0 implementation +- M0.1: Tavily key verified (free tier, 1000 searches/month) +- M0.2: All dependencies install clean +- M0.3: 8 Pydantic models implementing the full contract, 32 tests passing + +### Commits +- `f1e27e3` — Initial commit (auto-init) +- `deb124e` — Project scaffolding (dirs, README, pyproject.toml, CONTRIBUTING) +- `79becb2` — Fix README links (clone URL, issue URL) +- `6a8445e` — Fix wiki links to absolute URLs +- `1b0f863` — Contract models + 32 tests (feat/contract-models branch) + +## Key Decisions & Reasoning + +1. **Name: marchwarden** — Names the role (watcher at the frontier) not the tech. Immediately intuitive. Tolkien association exists but the word predates him. User preferred it over gnomon (vocabulary doesn't compound), rihla (Arabic baggage concern), and heuresis/scholia (less visceral). + +2. **Tavily over SearXNG** — Initially planned SearXNG (self-hosted, fits homelab), but switched back to Tavily to reduce Phase 0 friction. SearXNG requires deploying a container; Tavily is `pip install + API key`. Can swap later since search is behind an internal abstraction. + +3. **raw_excerpt on citations** — Prevents "Synthesis Paradox" where the PI synthesizes already-synthesized data, losing nuance. Every citation carries verbatim source text so the PI can verify claims against raw evidence. + +4. **Categorized gaps (GapCategory enum)** — Five categories (SOURCE_NOT_FOUND, ACCESS_DENIED, BUDGET_EXHAUSTED, CONTRADICTORY_SOURCES, SCOPE_EXCEEDED) drive different PI responses. Without categories, the PI can't distinguish "info doesn't exist" from "researcher ran out of budget." + +5. **discovery_events as lateral metadata** — MCP is request-response (no mid-flight notifications), so lateral findings are logged in the response for V2 orchestrator to process. Builds the nervous system for V2 without the complexity of streaming. + +6. **Confidence: deferred calibration** — confidence_factors expose scoring inputs now; formal rubric after 20-30 real queries. Premature formalization would be false precision. + +7. **model_id in CostMetadata** — Enables comparing research quality across model tiers (Haiku vs Sonnet vs Opus). One string field, high value for calibration. + +8. **Secrets in ~/secrets, not .env** — User's established pattern across all projects. Noted in memory for future sessions. + +## Surprises & Discoveries + +- **Repo created under wrong owner.** `mcp__gitea__create_repo` creates under the authenticated user (claude-code), not archeious. Had to create separately via REST API with admin token, then delete the claude-code copy. + +- **Wiki links in README.** Relative `/wiki/Architecture` resolves against the Forgejo root, not the repo. Needed full absolute URLs. Wiki-to-wiki cross-links have the same issue. + +- **The MCP request-response constraint is load-bearing.** It shapes the entire V2 architecture — no streaming progress, no mid-flight dispatch, no cancellation. Discovery events are the workaround. This will be the first thing to revisit if MCP adds streaming support. + +- **External critique (Gemini) was genuinely useful.** Caught the synthesis paradox, the honesty assumption weakness, and the replay-vs-audit distinction. Cross-model review is a good pattern for architectural work. + +## Concerns & Open Threads + +1. **Branch not merged yet.** `feat/contract-models` is sitting on Forgejo. Need to create PR and merge before starting Phase 1. + +2. **pyproject.toml still references `tavily-python`** but we haven't tested all deps together in a real agent loop yet. May discover version conflicts when we add the `anthropic` SDK agent loop. + +3. **The agent loop design (M1.3) is the hard part.** Models and tools are mechanical; the inner loop that decides when to search again, when to stop, how to populate confidence_factors — that's where the LLM prompt engineering lives. No amount of architecture prevents a bad prompt from producing bad research. + +4. **Token counting.** `cost_metadata.tokens_used` needs actual token tracking across the Claude API and Tavily calls. The anthropic SDK provides usage info; Tavily may not. Might need to estimate. + +5. **Trace directory permissions.** `~/.marchwarden/traces/` needs to be created on first run. Should handle gracefully. + +## Raw Thinking + +- The two-agent nesting (researcher inside MCP, called by PI/CLI) is where the real learning happens. The contract and models are scaffolding; the agent loop is the education. + +- The user brings in external AI critique naturally (Gemini analysis). This is a productive pattern — use it. Different models catch different things. + +- The "build real things for education" philosophy means V1 needs to actually work well enough to be useful, not just pass tests. The smoke test (Utah crops) will be the moment of truth. + +- Consider: should the trace logger be shared infrastructure (not just in `researchers/web/`) since every future researcher will need it? Might refactor to a top-level `marchwarden/trace.py` module. Not now — wait until the second researcher exists. + +## What's Next + +1. Create PR for `feat/contract-models`, merge to main +2. Start Phase 1: M1.1 (Tavily search + URL fetch tools) +3. Then M1.2 (trace logger), M1.3 (agent loop), M1.4 (MCP server) +4. The agent loop (M1.3) is the critical path — everything else is plumbing diff --git a/SessionRetrospectives.md b/SessionRetrospectives.md new file mode 100644 index 0000000..a7160cb --- /dev/null +++ b/SessionRetrospectives.md @@ -0,0 +1,7 @@ +# Session Retrospectives + +Index of all session notes for Marchwarden development. + +| Session | Date | Summary | Key Decisions | +|:---|:---|:---|:---| +| [Session 1](Session1) | 2026-04-08 | Project creation, naming, contract design, Phase 0 complete | Name: marchwarden; Tavily over SearXNG; raw_excerpt + categorized gaps + discovery_events in contract; confidence calibration deferred |