marchwarden/CLAUDE.md
2026-04-08 20:26:09 -06:00

71 lines
4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Marchwarden — Project Context
## What This Is
A network of agentic research specialists (MCP servers) coordinated by a
principal investigator (PI) agent. Educational project learning agents, MCP,
and agent composition.
## Current Project State
| | |
|---|---|
| **Phase** | Phases 03 substantially complete (M3.3 awaiting human rating); Phase 5 started (M5.1.1 shipped) |
| **Last worked on** | 2026-04-08 |
| **Last commit** | `78f08c9` — Merge PR #59: M3.3 Phase A calibration data collection |
| **Branch** | `main` (clean) |
| **Tests** | 141 passing |
| **Blocking issues** | #53 (budget cap lag — recommended fix before #39); #46 (M3.3 Phase B awaiting human rating) |
## Key Files
| File | Purpose |
|---|---|
| `researchers/web/models.py` | Research Contract v1 Pydantic models + `DEPTH_PRESETS` |
| `researchers/web/tools.py` | Tavily search + URL fetch with content hashing |
| `researchers/web/trace.py` | JSONL trace logger + step-duration tracking + structlog mirror |
| `researchers/web/agent.py` | WebResearcher — inner agentic loop |
| `researchers/web/server.py` | FastMCP server wrapping the researcher |
| `cli/main.py` | CLI: `ask` / `replay` / `costs` |
| `obs/__init__.py` | Structured operational logger (structlog) |
| `obs/costs.py` | Cost ledger + price table |
| `Makefile` | `make install` / `test` / `ask` / `costs` / `clean` |
| `Dockerfile` + `scripts/docker-test.sh` | Reproducible test environment |
## Architecture
- **Researcher** = MCP server exposing `research(question) -> ResearchResult`
- **ResearchResult** = answer + citations (with raw_excerpt) + categorized gaps +
discovery_events + open_questions + confidence + confidence_factors + cost_metadata + trace_id
- **Agent loop** = Claude tool-use loop (plan→search→fetch→iterate) + synthesis step
- **Trace** = JSONL audit log per research call at `~/.marchwarden/traces/`
## Conventions
- API keys live in `~/secrets` (not `.env`)
- Wiki is at `docs/wiki/` (local git clone, not MCP — wiki MCP is buggy)
- All merges via Forgejo API (claude-code user can't merge via MCP)
- One branch per concern, merge via PR, delete branch after
## Session Log
| Session | Date | Summary |
|---|---|---|
| 1 | 2026-04-08 | Project creation, naming, contract design, Phase 0 + Phase 1 complete (81 tests) |
| 2 | 2026-04-08 | Phase 2 (CLI shim) + Phase 2.5 (logging + cost tracking) shipped; V1 ships; depth presets; docker test env; per-step duration tracking; arxiv-rag scoped as M5.1; Phase 3/4/5/6 milestones populated (123 tests) |
| 3 | 2026-04-08 | Phase 3 stress testing: M3.1+M3.2 closed, M3.3 split into Phases A/B/C with A done. Trace observability fix (#54) — full ResearchResult persisted as sibling + per-item events. M5.1.1 arxiv-rag ingest pipeline shipped (researchers/arxiv/, [arxiv] optional extra, lazy CLI imports). Structured-data tool critiqued and deferred until M6 PI consumer exists. Filed #53 (budget cap lag — recommended next session). 141 tests |
## What's Next
**Recommended next session: fix #53 (budget cap lag) before continuing Phase 5.** The arxiv researcher's eventual agent loop (#40) will inherit budget semantics from the web researcher — fix the bug before duplicating it.
Order of next-session candidates:
1. **#53** — budget cap lag bug. Single-file fix in `researchers/web/agent.py` plus a regression test. ~30 min.
2. **Live arxiv smoke**`marchwarden arxiv add 1706.03762` end-to-end. Validates M5.1.1 against a real PDF. First run downloads ~500MB embedding model.
3. **#39** — M5.1.2 arxiv-rag retrieval primitive. Builds the query API on top of M5.1.1's chromadb collection.
4. **M3.3 Phase C** — once the user brings back `docs/stress-tests/M3.3-rating-worksheet.md` with `actual_rating` columns filled in. Analysis script + rubric + wiki update.
5. **M4.1** (#47) — error handling / hardening. Independent of everything above.
**Open issues:** #53 (budget cap lag), #46 (M3.3 awaiting rating).
**Open milestones in Forgejo:** Phase 3 (1 issue: #46), Phase 4 (3 issues), Phase 5 (7 issues remaining), Phase 6 (2 issues).