Jeff Smith d279c4c20e chore: update CLAUDE.md for session 2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-08 17:30:59 -06:00

3.1 KiB

Raw Blame History

Marchwarden — Project Context

What This Is

A network of agentic research specialists (MCP servers) coordinated by a principal investigator (PI) agent. Educational project learning agents, MCP, and agent composition.

Current Project State


Phase	Phases 0–2.5 complete; V1 shipped. Next: Phase 3 (stress testing) or Phase 5 (arxiv-rag)
Last worked on	2026-04-08
Last commit	`af79358` — Merge PR #36: per-step durations in trace and operational logs
Branch	`main` (clean)
Tests	123 passing
Blocking issues	None

Key Files

File	Purpose
`researchers/web/models.py`	Research Contract v1 Pydantic models + `DEPTH_PRESETS`
`researchers/web/tools.py`	Tavily search + URL fetch with content hashing
`researchers/web/trace.py`	JSONL trace logger + step-duration tracking + structlog mirror
`researchers/web/agent.py`	WebResearcher — inner agentic loop
`researchers/web/server.py`	FastMCP server wrapping the researcher
`cli/main.py`	CLI: `ask` / `replay` / `costs`
`obs/__init__.py`	Structured operational logger (structlog)
`obs/costs.py`	Cost ledger + price table
`Makefile`	`make install` / `test` / `ask` / `costs` / `clean`
`Dockerfile` + `scripts/docker-test.sh`	Reproducible test environment

Architecture

Researcher = MCP server exposing research(question) -> ResearchResult
ResearchResult = answer + citations (with raw_excerpt) + categorized gaps + discovery_events + open_questions + confidence + confidence_factors + cost_metadata + trace_id
Agent loop = Claude tool-use loop (plan→search→fetch→iterate) + synthesis step
Trace = JSONL audit log per research call at ~/.marchwarden/traces/

Conventions

API keys live in ~/secrets (not .env)
Wiki is at docs/wiki/ (local git clone, not MCP — wiki MCP is buggy)
All merges via Forgejo API (claude-code user can't merge via MCP)
One branch per concern, merge via PR, delete branch after

Session Log

Session	Date	Summary
1	2026-04-08	Project creation, naming, contract design, Phase 0 + Phase 1 complete (81 tests)
2	2026-04-08	Phase 2 (CLI shim) + Phase 2.5 (logging + cost tracking) shipped; V1 ships; depth presets; docker test env; per-step duration tracking; arxiv-rag scoped as M5.1; Phase 3/4/5/6 milestones populated (123 tests)

What's Next

Recommended next session: Phase 3 (Stress Testing & Calibration) before Phase 5, since stress tests will likely tighten the contract before a second researcher has to implement it.

Phase 3: Issue #44 (M3.1 single-axis stress tests) → #45 (M3.2 multi-axis) → #46 (M3.3 confidence calibration)
Phase 5 alternative: Issue #38 (M5.1.1 arxiv-rag ingest pipeline). New deps: pymupdf, chromadb, sentence-transformers, arxiv. Design lives at wiki/ArxivRagProposal.

Open milestones in Forgejo: Phase 3 (3 issues), Phase 4 (3 issues), Phase 5 (8 issues including arxiv-rag tracker), Phase 6 (2 issues).

3.1 KiB Raw Blame History Unescape Escape