Researcher #2: arxiv-rag — semantic search over a curated arXiv reading list #37
Labels
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: archeious/marchwarden#37
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal
Second researcher implementing the v1 contract: a RAG-based reader of arXiv papers. Sister to the planned grep-based file researcher (M5.1). Returns the same
ResearchResultshape so the (future) PI orchestrator can blend its findings with the web researcher.Detailed design lives at wiki/ArxivRagProposal — this issue is the implementation tracker.
Locked-in design defaults
marchwarden arxiv add <id>andmarchwarden arxiv list~/.marchwarden/arxiv-rag/pymupdf(swap tomarkerlater if math fidelity suffers)chromadb(embedded, file-backed)nomic-embed-text-v1.5local; switch tovoyage-3if quality is poorresearchers/arxiv/server.pyexposingresearch()ResearchResultas the web researcher —Citation.locatoris the arXiv abs URL,raw_excerptis the chunk textImplementation milestones
To be filed as separate sub-issues once this proposal is signed off:
marchwarden arxiv add <id>: download PDF, extract via pymupdf, section-chunk, embed, store in chromadb. Records to a sidecar manifest at~/.marchwarden/arxiv-rag/papers.json.WebResearcher-style synthesis prompt adapted for academic tone.researchers/arxiv/server.py, mirrorsresearchers/web/server.py. Same tool nameresearch(), same contract.marchwarden ask "..." --researcher arxiv(default stillweb). Stretch:--researchers web,arxivto fan out and merge.Out of scope for this milestone
Open questions to resolve before A.1
See the proposal page for full design rationale, alternatives considered, and architecture sketch.
marchwarden arxiv add) #38--researcher arxiv) #42embedding_calls) #43