2025-04-10T13:35:07Z - 2026-04-10T13:35:07Z
Overview
26 pull requests merged by 1 user
Merged
#59 docs(stress-tests): M3.3 Phase A — calibration data collection
Merged
#58 feat(arxiv): ingest pipeline (M5.1.1)
Merged
#57 docs(stress-tests): archive M3.2 multi-axis results
Merged
#56 fix(observability): persist full ResearchResult and per-item trace events
Merged
#55 docs(stress-tests): archive M3.1 results
Merged
#36 Record per-step durations in trace and operational logs
Merged
#33 depth flag now drives constraint defaults
Merged
#34 chore: Makefile with venv-based dev workflow
Merged
#32 Mirror trace steps to the operational logger
Merged
#31 Display budget as spend status, not exhaustion alarm
Merged
#29 M2.5.3: marchwarden costs CLI command
Merged
#28 M2.5.2: Cost ledger with price table
Merged
#27 M2.5.1: Structured application logger
Merged
#23 Propagate parent env to MCP server subprocess
Merged
#22 Enforce token_budget before each iteration
Merged
#21 Fix invalid default model id
Merged
#20 Fix synthesis truncation and trace masking
Merged
#14 chore: docker-based test environment
Merged
#12 M2.2: marchwarden replay CLI command
Merged
#11 M2.1: marchwarden ask CLI command
Merged
#7 M1.4: MCP server
Merged
#6 Add OpenQuestion to research contract
Merged
#5 M1.3: Inner agent loop
Merged
#4 M1.2: Trace logger
Merged
#3 M1.1: Search and fetch tools
Merged
#2 M0.3: Contract v1 Pydantic models
19 issues closed from 1 user
Closed
#38 M5.1.1 arxiv-rag: ingest pipeline (marchwarden arxiv add)
Closed
#45 M3.2 Multi-axis stress test
Closed
#54 Persist full ResearchResult alongside trace (observability gap)
Closed
#44 M3.1 Single-axis stress tests
Closed
#35 Record per-step duration in trace and operational logs
Closed
#30 depth flag should drive iteration / budget / source defaults
Closed
#26 M2.5.3: marchwarden costs CLI command
Closed
#25 M2.5.2: Cost ledger
Closed
#24 M2.5.1: Structured application logger (structlog)
Closed
#1 V1: Web-search researcher MCP + CLI shim
Closed
#10 M2.3: First end-to-end smoke test (Utah crops)
Closed
#18 Bug: MCP stdio client doesn't propagate parent env to server subprocess
Closed
#17 Bug: token_budget is not actually enforced
Closed
#15 Bug: server default model id is invalid (claude-sonnet-4-5-20250514)
Closed
#19 Bug: trace logger truncates long field values
Closed
#16 Bug: synthesis output parsing fails on real research runs
Closed
#13 Docker-based testing environment
Closed
#9 M2.2: marchwarden replay <trace_id> CLI command
Closed
#8 M2.1: marchwarden ask CLI command
33 issues created by 1 user
Opened
#1 V1: Web-search researcher MCP + CLI shim
Opened
#8 M2.1: marchwarden ask CLI command
Opened
#9 M2.2: marchwarden replay <trace_id> CLI command
Opened
#10 M2.3: First end-to-end smoke test (Utah crops)
Opened
#13 Docker-based testing environment
Opened
#15 Bug: server default model id is invalid (claude-sonnet-4-5-20250514)
Opened
#16 Bug: synthesis output parsing fails on real research runs
Opened
#17 Bug: token_budget is not actually enforced
Opened
#18 Bug: MCP stdio client doesn't propagate parent env to server subprocess
Opened
#19 Bug: trace logger truncates long field values
Opened
#24 M2.5.1: Structured application logger (structlog)
Opened
#25 M2.5.2: Cost ledger
Opened
#26 M2.5.3: marchwarden costs CLI command
Opened
#30 depth flag should drive iteration / budget / source defaults
Opened
#35 Record per-step duration in trace and operational logs
Opened
#37 Researcher #2: arxiv-rag — semantic search over a curated arXiv reading list
Opened
#38 M5.1.1 arxiv-rag: ingest pipeline (marchwarden arxiv add)
Opened
#39 M5.1.2 arxiv-rag: retrieval primitive
Opened
#40 M5.1.3 arxiv-rag: ArxivResearcher agent loop
Opened
#41 M5.1.4 arxiv-rag: MCP server
Opened
#42 M5.1.5 arxiv-rag: CLI integration (--researcher arxiv)
Opened
#43 M5.1.6 arxiv-rag: cost ledger fields (embedding_calls)
Opened
#44 M3.1 Single-axis stress tests
Opened
#45 M3.2 Multi-axis stress test
Opened
#46 M3.3 Confidence calibration (V1.1)
Opened
#47 M4.1 Error handling and graceful degradation
Opened
#48 M4.2 Test suite expansion and contract compliance
Opened
#49 M4.3 Documentation polish (15-minute new-developer test)
Opened
#50 M5.2 Contract validation across two researchers
Opened
#51 M6.1 PI Agent core
Opened
#52 M6.2 PI-driven CLI (replaces V1 ask command)
Opened
#53 Budget cap lags one iteration behind tool payload growth
Opened
#54 Persist full ResearchResult alongside trace (observability gap)