marchwarden/CLAUDE.md
2026-04-08 20:26:09 -06:00

4 KiB
Raw Blame History

Marchwarden — Project Context

What This Is

A network of agentic research specialists (MCP servers) coordinated by a principal investigator (PI) agent. Educational project learning agents, MCP, and agent composition.

Current Project State

Phase Phases 03 substantially complete (M3.3 awaiting human rating); Phase 5 started (M5.1.1 shipped)
Last worked on 2026-04-08
Last commit 78f08c9 — Merge PR #59: M3.3 Phase A calibration data collection
Branch main (clean)
Tests 141 passing
Blocking issues #53 (budget cap lag — recommended fix before #39); #46 (M3.3 Phase B awaiting human rating)

Key Files

File Purpose
researchers/web/models.py Research Contract v1 Pydantic models + DEPTH_PRESETS
researchers/web/tools.py Tavily search + URL fetch with content hashing
researchers/web/trace.py JSONL trace logger + step-duration tracking + structlog mirror
researchers/web/agent.py WebResearcher — inner agentic loop
researchers/web/server.py FastMCP server wrapping the researcher
cli/main.py CLI: ask / replay / costs
obs/__init__.py Structured operational logger (structlog)
obs/costs.py Cost ledger + price table
Makefile make install / test / ask / costs / clean
Dockerfile + scripts/docker-test.sh Reproducible test environment

Architecture

  • Researcher = MCP server exposing research(question) -> ResearchResult
  • ResearchResult = answer + citations (with raw_excerpt) + categorized gaps + discovery_events + open_questions + confidence + confidence_factors + cost_metadata + trace_id
  • Agent loop = Claude tool-use loop (plan→search→fetch→iterate) + synthesis step
  • Trace = JSONL audit log per research call at ~/.marchwarden/traces/

Conventions

  • API keys live in ~/secrets (not .env)
  • Wiki is at docs/wiki/ (local git clone, not MCP — wiki MCP is buggy)
  • All merges via Forgejo API (claude-code user can't merge via MCP)
  • One branch per concern, merge via PR, delete branch after

Session Log

Session Date Summary
1 2026-04-08 Project creation, naming, contract design, Phase 0 + Phase 1 complete (81 tests)
2 2026-04-08 Phase 2 (CLI shim) + Phase 2.5 (logging + cost tracking) shipped; V1 ships; depth presets; docker test env; per-step duration tracking; arxiv-rag scoped as M5.1; Phase 3/4/5/6 milestones populated (123 tests)
3 2026-04-08 Phase 3 stress testing: M3.1+M3.2 closed, M3.3 split into Phases A/B/C with A done. Trace observability fix (#54) — full ResearchResult persisted as sibling + per-item events. M5.1.1 arxiv-rag ingest pipeline shipped (researchers/arxiv/, [arxiv] optional extra, lazy CLI imports). Structured-data tool critiqued and deferred until M6 PI consumer exists. Filed #53 (budget cap lag — recommended next session). 141 tests

What's Next

Recommended next session: fix #53 (budget cap lag) before continuing Phase 5. The arxiv researcher's eventual agent loop (#40) will inherit budget semantics from the web researcher — fix the bug before duplicating it.

Order of next-session candidates:

  1. #53 — budget cap lag bug. Single-file fix in researchers/web/agent.py plus a regression test. ~30 min.
  2. Live arxiv smokemarchwarden arxiv add 1706.03762 end-to-end. Validates M5.1.1 against a real PDF. First run downloads ~500MB embedding model.
  3. #39 — M5.1.2 arxiv-rag retrieval primitive. Builds the query API on top of M5.1.1's chromadb collection.
  4. M3.3 Phase C — once the user brings back docs/stress-tests/M3.3-rating-worksheet.md with actual_rating columns filled in. Analysis script + rubric + wiki update.
  5. M4.1 (#47) — error handling / hardening. Independent of everything above.

Open issues: #53 (budget cap lag), #46 (M3.3 awaiting rating).

Open milestones in Forgejo: Phase 3 (1 issue: #46), Phase 4 (3 issues), Phase 5 (7 issues remaining), Phase 6 (2 issues).