diff --git a/ArxivRagProposal.md b/ArxivRagProposal.md
index 18d9ddb..a335ef6 100644
--- a/ArxivRagProposal.md
+++ b/ArxivRagProposal.md
@@ -1,6 +1,6 @@
 # Implementation Proposal: arxiv-rag Researcher
 
-**Status:** Draft — awaiting review
+**Status:** Approved 2026-04-08
 **Tracking issue:** [#37](https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden/issues/37)
 **Sister to:** Roadmap M5.1 (grep-based file researcher) — different tool, same contract
 
@@ -196,22 +196,17 @@ marchwarden ask "..." --researchers web,arxiv      # fan out, merge in CLI
 
 ---
 
-## Open questions
+## Resolved decisions (was: Open questions)
 
-1. **Embeddings: local vs API.** Start with `nomic-embed-text-v1.5` (free, local). Add `voyage-3` upgrade path via env var. Defer the decision until real queries are flowing — quality is hard to evaluate in the abstract.
+1. **Embeddings: local vs API.** ✅ **Resolved 2026-04-08:** start with `nomic-embed-text-v1.5` (free, local). `voyage-3` upgrade path via `MARCHWARDEN_ARXIV_EMBED_MODEL` env var, deferred until real-world quality review.
 
-2. **BibTeX import.** Many users keep arxiv references in BibTeX (`.bib`) files from Zotero / LaTeX. Should `arxiv add` accept a `.bib` file and ingest every arxiv ID it finds? **Recommendation: no for v1.** Keep `arxiv add <id>` simple. BibTeX import is a one-off helper script that can come later.
+2. **BibTeX import.** ✅ **Resolved 2026-04-08:** skip for v1. `arxiv add <id>` only. BibTeX importer is a future helper.
 
-3. **Paper versions.** arXiv papers have versions (`2403.12345v1`, `v2`, …). Three policies:
-   - **Pin** — index whatever the user supplies, never auto-update
-   - **Always latest** — re-fetch on every `marchwarden arxiv refresh`, replace chunks
-   - **Track both** — index every version separately, distinguish in citations
+3. **Paper versions.** ✅ **Resolved 2026-04-08:** pin to whatever the user supplies. Never auto-update. `marchwarden arxiv update <id>` will exist as an explicit action later.
 
-   **Recommendation: pin for v1.** Simplest. `arxiv update <id>` as an explicit user action later.
+4. **Chunk-id stability.** ✅ **Resolved 2026-04-08:** make embedding model part of the chunk ID hash, store it in `papers.json`. Re-ingest with a different model creates a new collection rather than overwriting old citations.
 
-4. **Chunk-id stability.** If we re-ingest with a new embedding model, chunk IDs change. Citations in past traces would become unresolvable. **Recommendation:** make embedding model part of the chunk ID hash, and store it in `papers.json`. A re-ingest creates a new collection rather than overwriting.
-
-5. **Cost ledger fields.** What does "cost" mean for a researcher that uses local embeddings? **Recommendation:** add an `embedding_calls` field to ledger entries (similar to `tavily_searches`); $0 for local, real cost for API embeddings. The synthesis call still bills via the existing model price table.
+5. **Cost ledger fields.** ✅ **Resolved 2026-04-08:** add an `embedding_calls` field to ledger entries (parallel to `tavily_searches`); $0 for local, real cost for API embeddings. The synthesis call still bills via the existing model price table.
 
 ---
 
diff --git a/Roadmap.md b/Roadmap.md
index 288642b..589006c 100644
--- a/Roadmap.md
+++ b/Roadmap.md
@@ -153,19 +153,25 @@ Run each, verify the specific contract feature it targets:
 ## Phase 5: Second Researcher (V2 begins)
 **Goal:** Prove the contract works across researcher types.
 
-### M5.1 — File/Document Researcher
-- `researchers/docs/` — same contract, different tools
-- Searches a local file corpus (glob + grep + read)
-- Returns citations with file paths instead of URLs
-- Same gaps, discovery_events, confidence_factors structure
-- **Deliverable:** Two researchers, same contract, different sources
+### M5.1 — arxiv-rag Researcher
+*Tracking issue: [#37](https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden/issues/37) · Design: [ArxivRagProposal](ArxivRagProposal)*
+
+- `researchers/arxiv/` — RAG-based reader of a user-curated arXiv reading list
+- Same `ResearchResult` contract, different evidence path (chromadb vector store, not Tavily)
+- Citations point to arxiv abs URLs; raw_excerpt is the chunk text
+- Sub-milestones (A.1–A.6 in the tracking issue): ingest pipeline, retrieval primitive, agent loop, MCP server, CLI integration, cost-ledger fields
+- **Deliverable:** Two working researchers, same contract, different sources
 
 ### M5.2 — Contract Validation
-- Run the same question through both researchers
+- Run the same question through both researchers (web + arxiv-rag)
 - Compare: do the contracts compose cleanly? Can the PI synthesize across them?
 - Identify any contract changes needed (backward-compatible additions only)
 - **Deliverable:** Validated multi-researcher contract
 
+### Future ideas (post-V2)
+- **File/document researcher** — grep+read over a local file corpus. Was the original M5.1 placeholder; demoted because no concrete user corpus drove its design. Re-prioritize when one shows up.
+- **Live arXiv search + cache (option C in the proposal)** — extend arxiv-rag from a curated reading list to a growing semantic cache
+
 ---
 
 ## Phase 6: PI Orchestrator (V2)