feat(arxiv): ingest pipeline (M5.1.1) #58
Loading…
Reference in a new issue
No description provided.
Delete branch "feat/arxiv-rag-ingest"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #38. First sub-milestone of M5.1 (arxiv-rag researcher).
What ships
researchers/arxiv/withstore.py(chromadb wrapper + papers.json manifest) andingest.py(download → extract → embed → store).marchwarden arxiv add|list|info|remove, lazy-imports heavy deps.[arxiv]optional extra inpyproject.toml(pymupdf, chromadb, sentence-transformers, arxiv) — base install stays slim.What's deferred (later sub-milestones)
ask --researcher arxivflag (#42)embedding_callsfield (#43)Notes
pip installpulled in the CUDA torch wheel (~2GB nvidia libs) — harmless on CPU-only WSL but worth pinning to the CPU index in a follow-up.