71 lines
4 KiB
Markdown
71 lines
4 KiB
Markdown
# Marchwarden — Project Context
|
||
|
||
## What This Is
|
||
A network of agentic research specialists (MCP servers) coordinated by a
|
||
principal investigator (PI) agent. Educational project learning agents, MCP,
|
||
and agent composition.
|
||
|
||
## Current Project State
|
||
|
||
| | |
|
||
|---|---|
|
||
| **Phase** | Phases 0–3 substantially complete (M3.3 awaiting human rating); Phase 5 started (M5.1.1 shipped) |
|
||
| **Last worked on** | 2026-04-08 |
|
||
| **Last commit** | `78f08c9` — Merge PR #59: M3.3 Phase A calibration data collection |
|
||
| **Branch** | `main` (clean) |
|
||
| **Tests** | 141 passing |
|
||
| **Blocking issues** | #53 (budget cap lag — recommended fix before #39); #46 (M3.3 Phase B awaiting human rating) |
|
||
|
||
## Key Files
|
||
|
||
| File | Purpose |
|
||
|---|---|
|
||
| `researchers/web/models.py` | Research Contract v1 Pydantic models + `DEPTH_PRESETS` |
|
||
| `researchers/web/tools.py` | Tavily search + URL fetch with content hashing |
|
||
| `researchers/web/trace.py` | JSONL trace logger + step-duration tracking + structlog mirror |
|
||
| `researchers/web/agent.py` | WebResearcher — inner agentic loop |
|
||
| `researchers/web/server.py` | FastMCP server wrapping the researcher |
|
||
| `cli/main.py` | CLI: `ask` / `replay` / `costs` |
|
||
| `obs/__init__.py` | Structured operational logger (structlog) |
|
||
| `obs/costs.py` | Cost ledger + price table |
|
||
| `Makefile` | `make install` / `test` / `ask` / `costs` / `clean` |
|
||
| `Dockerfile` + `scripts/docker-test.sh` | Reproducible test environment |
|
||
|
||
## Architecture
|
||
|
||
- **Researcher** = MCP server exposing `research(question) -> ResearchResult`
|
||
- **ResearchResult** = answer + citations (with raw_excerpt) + categorized gaps +
|
||
discovery_events + open_questions + confidence + confidence_factors + cost_metadata + trace_id
|
||
- **Agent loop** = Claude tool-use loop (plan→search→fetch→iterate) + synthesis step
|
||
- **Trace** = JSONL audit log per research call at `~/.marchwarden/traces/`
|
||
|
||
## Conventions
|
||
|
||
- API keys live in `~/secrets` (not `.env`)
|
||
- Wiki is at `docs/wiki/` (local git clone, not MCP — wiki MCP is buggy)
|
||
- All merges via Forgejo API (claude-code user can't merge via MCP)
|
||
- One branch per concern, merge via PR, delete branch after
|
||
|
||
## Session Log
|
||
|
||
| Session | Date | Summary |
|
||
|---|---|---|
|
||
| 1 | 2026-04-08 | Project creation, naming, contract design, Phase 0 + Phase 1 complete (81 tests) |
|
||
| 2 | 2026-04-08 | Phase 2 (CLI shim) + Phase 2.5 (logging + cost tracking) shipped; V1 ships; depth presets; docker test env; per-step duration tracking; arxiv-rag scoped as M5.1; Phase 3/4/5/6 milestones populated (123 tests) |
|
||
| 3 | 2026-04-08 | Phase 3 stress testing: M3.1+M3.2 closed, M3.3 split into Phases A/B/C with A done. Trace observability fix (#54) — full ResearchResult persisted as sibling + per-item events. M5.1.1 arxiv-rag ingest pipeline shipped (researchers/arxiv/, [arxiv] optional extra, lazy CLI imports). Structured-data tool critiqued and deferred until M6 PI consumer exists. Filed #53 (budget cap lag — recommended next session). 141 tests |
|
||
|
||
## What's Next
|
||
|
||
**Recommended next session: fix #53 (budget cap lag) before continuing Phase 5.** The arxiv researcher's eventual agent loop (#40) will inherit budget semantics from the web researcher — fix the bug before duplicating it.
|
||
|
||
Order of next-session candidates:
|
||
|
||
1. **#53** — budget cap lag bug. Single-file fix in `researchers/web/agent.py` plus a regression test. ~30 min.
|
||
2. **Live arxiv smoke** — `marchwarden arxiv add 1706.03762` end-to-end. Validates M5.1.1 against a real PDF. First run downloads ~500MB embedding model.
|
||
3. **#39** — M5.1.2 arxiv-rag retrieval primitive. Builds the query API on top of M5.1.1's chromadb collection.
|
||
4. **M3.3 Phase C** — once the user brings back `docs/stress-tests/M3.3-rating-worksheet.md` with `actual_rating` columns filled in. Analysis script + rubric + wiki update.
|
||
5. **M4.1** (#47) — error handling / hardening. Independent of everything above.
|
||
|
||
**Open issues:** #53 (budget cap lag), #46 (M3.3 awaiting rating).
|
||
|
||
**Open milestones in Forgejo:** Phase 3 (1 issue: #46), Phase 4 (3 issues), Phase 5 (7 issues remaining), Phase 6 (2 issues).
|