M4.3 Documentation polish (15-minute new-developer test)
M4.2 Test suite expansion and contract compliance
M4.1 Error handling and graceful degradation
M3.3 Confidence calibration (V1.1)
M3.2 Multi-axis stress test
M3.1 Single-axis stress tests
A.6 arxiv-rag: cost ledger fields (
embedding_calls)
A.5 arxiv-rag: CLI integration (
--researcher arxiv)
A.4 arxiv-rag: MCP server
A.3 arxiv-rag: ArxivResearcher agent loop
A.2 arxiv-rag: retrieval primitive
A.1 arxiv-rag: ingest pipeline (
marchwarden arxiv add)
Record per-step duration in trace and operational logs
Researcher #2: arxiv-rag — semantic search over a curated arXiv reading list
Record per-step durations in trace and operational logs
Record per-step duration in trace and operational logs
depth flag should drive iteration / budget / source defaults
chore: Makefile with venv-based dev workflow
depth flag now drives constraint defaults
Mirror trace steps to the operational logger