M5.1.3 arxiv-rag: ArxivResearcher agent loop #40

New issue

Open

opened 2026-04-08 17:17:34 -06:00 by claude-code · 0 comments

claude-code commented

2026-04-08 17:17:34 -06:00

Collaborator

Third sub-milestone of Issue #37. Design: ArxivRagProposal.

Goal

Wraps the M5.1.2 retrieval primitive in the same plan → tool-use → iterate → synthesize loop the web researcher uses, but with arxiv chunks instead of web fetches. Returns a ResearchResult matching the v1 contract.

Scope

researchers/arxiv/agent.py:
- ArxivResearcher class with the same shape as WebResearcher
- async research(question, context, depth, constraints) -> ResearchResult
- Tool surface for the inner LLM loop:
  - retrieve_chunks(query, k) — call into M5.1.2
  - read_full_section(arxiv_id, section) — fetch the entire section by ID for cases where the chunk excerpt isn't enough context
- Same trace-step pattern as web researcher (iteration_start, retrieve_chunks, retrieve_chunks_complete, synthesis_start, synthesis_complete, complete) so duration tracking and operational logs work for free
- Synthesis prompt adapted for academic tone: cite arxiv papers by [Author et al., Year, arXiv:ID], prefer methods sections for "how" questions, results for "what" questions
- Same cost_metadata structure with model_id set to whatever Claude model performed the synthesis
- Citation.locator is the arxiv abs URL (https://arxiv.org/abs/<id>); raw_excerpt is the chunk text verbatim
- Confidence factors adapted: source_authority is always high (peer-reviewed), recency derived from the paper year

Tests

Mock retrieval to return canned chunks; assert agent loop produces a valid ResearchResult with the canned content as citations
Agent with empty store returns a result with gaps[].category=source_not_found
Agent honors max_iterations and token_budget the same way WebResearcher does

Out of scope

The MCP server wrapper (M5.1.4)
The CLI integration (M5.1.5)

Branch

feat/arxiv-rag-agent

Blocked by: M5.1.2. Blocks: M5.1.4.

Third sub-milestone of Issue #37. Design: [ArxivRagProposal](https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden/wiki/ArxivRagProposal). ## Goal Wraps the M5.1.2 retrieval primitive in the same plan → tool-use → iterate → synthesize loop the web researcher uses, but with arxiv chunks instead of web fetches. Returns a `ResearchResult` matching the v1 contract. ## Scope - `researchers/arxiv/agent.py`: - `ArxivResearcher` class with the same shape as `WebResearcher` - `async research(question, context, depth, constraints) -> ResearchResult` - Tool surface for the inner LLM loop: - `retrieve_chunks(query, k)` — call into M5.1.2 - `read_full_section(arxiv_id, section)` — fetch the entire section by ID for cases where the chunk excerpt isn't enough context - Same trace-step pattern as web researcher (`iteration_start`, `retrieve_chunks`, `retrieve_chunks_complete`, `synthesis_start`, `synthesis_complete`, `complete`) so duration tracking and operational logs work for free - Synthesis prompt adapted for academic tone: cite arxiv papers by `[Author et al., Year, arXiv:ID]`, prefer methods sections for "how" questions, results for "what" questions - Same `cost_metadata` structure with `model_id` set to whatever Claude model performed the synthesis - `Citation.locator` is the arxiv abs URL (`https://arxiv.org/abs/<id>`); `raw_excerpt` is the chunk text verbatim - Confidence factors adapted: `source_authority` is always `high` (peer-reviewed), `recency` derived from the paper year ## Tests - Mock retrieval to return canned chunks; assert agent loop produces a valid `ResearchResult` with the canned content as citations - Agent with empty store returns a result with `gaps[].category=source_not_found` - Agent honors `max_iterations` and `token_budget` the same way `WebResearcher` does ## Out of scope - The MCP server wrapper (M5.1.4) - The CLI integration (M5.1.5) ## Branch `feat/arxiv-rag-agent` Blocked by: M5.1.2. Blocks: M5.1.4.