3.1 KiB
3.1 KiB
Marchwarden — Project Context
What This Is
A network of agentic research specialists (MCP servers) coordinated by a principal investigator (PI) agent. Educational project learning agents, MCP, and agent composition.
Current Project State
| Phase | Phases 0–2.5 complete; V1 shipped. Next: Phase 3 (stress testing) or Phase 5 (arxiv-rag) |
| Last worked on | 2026-04-08 |
| Last commit | af79358 — Merge PR #36: per-step durations in trace and operational logs |
| Branch | main (clean) |
| Tests | 123 passing |
| Blocking issues | None |
Key Files
| File | Purpose |
|---|---|
researchers/web/models.py |
Research Contract v1 Pydantic models + DEPTH_PRESETS |
researchers/web/tools.py |
Tavily search + URL fetch with content hashing |
researchers/web/trace.py |
JSONL trace logger + step-duration tracking + structlog mirror |
researchers/web/agent.py |
WebResearcher — inner agentic loop |
researchers/web/server.py |
FastMCP server wrapping the researcher |
cli/main.py |
CLI: ask / replay / costs |
obs/__init__.py |
Structured operational logger (structlog) |
obs/costs.py |
Cost ledger + price table |
Makefile |
make install / test / ask / costs / clean |
Dockerfile + scripts/docker-test.sh |
Reproducible test environment |
Architecture
- Researcher = MCP server exposing
research(question) -> ResearchResult - ResearchResult = answer + citations (with raw_excerpt) + categorized gaps + discovery_events + open_questions + confidence + confidence_factors + cost_metadata + trace_id
- Agent loop = Claude tool-use loop (plan→search→fetch→iterate) + synthesis step
- Trace = JSONL audit log per research call at
~/.marchwarden/traces/
Conventions
- API keys live in
~/secrets(not.env) - Wiki is at
docs/wiki/(local git clone, not MCP — wiki MCP is buggy) - All merges via Forgejo API (claude-code user can't merge via MCP)
- One branch per concern, merge via PR, delete branch after
Session Log
| Session | Date | Summary |
|---|---|---|
| 1 | 2026-04-08 | Project creation, naming, contract design, Phase 0 + Phase 1 complete (81 tests) |
| 2 | 2026-04-08 | Phase 2 (CLI shim) + Phase 2.5 (logging + cost tracking) shipped; V1 ships; depth presets; docker test env; per-step duration tracking; arxiv-rag scoped as M5.1; Phase 3/4/5/6 milestones populated (123 tests) |
What's Next
Recommended next session: Phase 3 (Stress Testing & Calibration) before Phase 5, since stress tests will likely tighten the contract before a second researcher has to implement it.
- Phase 3: Issue #44 (M3.1 single-axis stress tests) → #45 (M3.2 multi-axis) → #46 (M3.3 confidence calibration)
- Phase 5 alternative: Issue #38 (M5.1.1 arxiv-rag ingest pipeline). New deps: pymupdf, chromadb, sentence-transformers, arxiv. Design lives at wiki/ArxivRagProposal.
Open milestones in Forgejo: Phase 3 (3 issues), Phase 4 (3 issues), Phase 5 (8 issues including arxiv-rag tracker), Phase 6 (2 issues).