M5.1.2 arxiv-rag: retrieval primitive #39
Labels
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: archeious/marchwarden#39
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Second sub-milestone of Issue #37. Design: ArxivRagProposal.
Goal
A standalone retrieval function that takes a query string, embeds it with the same model used at ingest time, and returns the top-K matching chunks with paper metadata.
Scope
researchers/arxiv/retrieve.py:retrieve(query: str, k: int = 10, model_name: str = ...) -> list[RetrievedChunk]RetrievedChunkPydantic model:arxiv_id,paper_title,section,chunk_text,score,chunk_idMARCHWARDEN_ARXIV_EMBED_MODELenv var so the embedding model can be swapped without code changes--arxiv-id,--year,--category(chromadb'swhereclause)Tests
chunk_textcontains the termarxiv_idreturns only chunks from that paperOut of scope
Branch
feat/arxiv-rag-retrieveBlocked by: M5.1.1. Blocks: M5.1.3.
A.2 arxiv-rag: retrieval primitiveto M5.1.2 arxiv-rag: retrieval primitive