V1: Web-search researcher MCP + CLI shim #1

Closed
opened 2026-04-08 17:59:06 +00:00 by claude-code · 1 comment
Collaborator

Ship target: V1 Marchwarden

Build a single agentic researcher exposed as an MCP server, controlled via CLI shim. This is the first node in a multi-agent research network; future versions add more specialists and a PI orchestrator.

Researcher capability:

  • Takes a single research(question, context?, depth?, constraints?) tool call
  • Runs internal agentic loop: plans, searches via Tavily, fetches URLs, iterates, synthesizes
  • Returns structured response: answer, citations[], gaps[], cost_metadata, trace_id
  • Server-enforced budgets: max 5 iterations, ~20k tokens per call
  • Produces JSONL trace logs (one file per research call, keyed by trace_id)

Server (MCP):

  • Implements the researcher contract
  • Exposes research(question, ...) as the sole tool
  • Enforces iteration/token budgets
  • Logs all traces to ~/.marchwarden/traces/ (or configurable path)

CLI shim:

  • marchwarden ask "what are ideal crops for a garden in Utah?"
  • marchwarden replay <trace_id>
  • Test harness for the researcher; will be replaced by PI orchestrator in V2

Contract details

Tool signature:

research(
  question: str,
  context?: str,           # what the PI already knows
  depth?: "shallow" | "deep" = "balanced",
  constraints?: {
    max_iterations?: int = 5,
    token_budget?: int = 20000,
  }
) → {
  answer: str,
  citations: [
    {
      source: str,         # "web", "file", "db", etc
      locator: str,        # URL, file path, row ID, etc
      snippet?: str,       # relevant excerpt
      confidence: float,   # 0.0-1.0
    }
  ],
  gaps: [
    {
      topic: str,         # what couldn't be resolved
      reason: str,        # "no sources found", "ambiguous", etc
    }
  ],
  cost_metadata: {
    tokens_used: int,
    iterations_run: int,
    wall_time_sec: float,
  },
  trace_id: str,          # UUID, links to JSONL trace file
}

Trace log (~/.marchwarden/traces/{trace_id}.jsonl):

  • One JSON object per inner-loop step
  • Fields: step, action, result, timestamp, decision
  • Supports replay and debugging

V1 is NOT

  • Multiple researchers (that's V2+)
  • PI orchestrator (that's V2+)
  • Database sources, file corpus, arxiv (V2+)
  • Web UI (keep CLI only)
  • Eval harness, caching, persistence beyond traces
  • Auth, multi-user, deployment

Ship checklist

  • Repo structure set up (see CONTRIBUTING.md or wiki)
  • MCP server implemented with research() tool
  • Internal agent loop (plan → search → fetch → iterate → synthesize)
  • Token budget enforcement
  • Trace logging (JSONL)
  • CLI shim (ask, replay commands)
  • Contract documented in wiki
  • Integration test: ask a non-trivial question, get back structured answer with citations and gaps
  • All tests passing

Decisions recorded

  • Stack: Python, claude-agent-sdk, official mcp SDK
  • Web search: Tavily (cheap, good for agents)
  • Name: Marchwarden (researcher at the frontier, reporting back)
  • Repo: archeious/marchwarden

Created: 2026-04-08
Assignee: archeious
Milestone: V1 Ship

## Ship target: V1 Marchwarden Build a single agentic researcher exposed as an MCP server, controlled via CLI shim. This is the first node in a multi-agent research network; future versions add more specialists and a PI orchestrator. ### Scope: Single researcher (web search) **Researcher capability:** - Takes a single `research(question, context?, depth?, constraints?)` tool call - Runs internal agentic loop: plans, searches via Tavily, fetches URLs, iterates, synthesizes - Returns structured response: `answer`, `citations[]`, `gaps[]`, `cost_metadata`, `trace_id` - Server-enforced budgets: max 5 iterations, ~20k tokens per call - Produces JSONL trace logs (one file per research call, keyed by trace_id) **Server (MCP):** - Implements the researcher contract - Exposes `research(question, ...)` as the sole tool - Enforces iteration/token budgets - Logs all traces to `~/.marchwarden/traces/` (or configurable path) **CLI shim:** - `marchwarden ask "what are ideal crops for a garden in Utah?"` - `marchwarden replay <trace_id>` - Test harness for the researcher; will be replaced by PI orchestrator in V2 ### Contract details **Tool signature:** ``` research( question: str, context?: str, # what the PI already knows depth?: "shallow" | "deep" = "balanced", constraints?: { max_iterations?: int = 5, token_budget?: int = 20000, } ) → { answer: str, citations: [ { source: str, # "web", "file", "db", etc locator: str, # URL, file path, row ID, etc snippet?: str, # relevant excerpt confidence: float, # 0.0-1.0 } ], gaps: [ { topic: str, # what couldn't be resolved reason: str, # "no sources found", "ambiguous", etc } ], cost_metadata: { tokens_used: int, iterations_run: int, wall_time_sec: float, }, trace_id: str, # UUID, links to JSONL trace file } ``` **Trace log** (`~/.marchwarden/traces/{trace_id}.jsonl`): - One JSON object per inner-loop step - Fields: `step`, `action`, `result`, `timestamp`, `decision` - Supports replay and debugging ### V1 is NOT - Multiple researchers (that's V2+) - PI orchestrator (that's V2+) - Database sources, file corpus, arxiv (V2+) - Web UI (keep CLI only) - Eval harness, caching, persistence beyond traces - Auth, multi-user, deployment ### Ship checklist - [ ] Repo structure set up (see CONTRIBUTING.md or wiki) - [ ] MCP server implemented with `research()` tool - [ ] Internal agent loop (plan → search → fetch → iterate → synthesize) - [ ] Token budget enforcement - [ ] Trace logging (JSONL) - [ ] CLI shim (ask, replay commands) - [ ] Contract documented in wiki - [ ] Integration test: ask a non-trivial question, get back structured answer with citations and gaps - [ ] All tests passing ### Decisions recorded - Stack: Python, `claude-agent-sdk`, official `mcp` SDK - Web search: Tavily (cheap, good for agents) - Name: Marchwarden (researcher at the frontier, reporting back) - Repo: archeious/marchwarden --- **Created:** 2026-04-08 **Assignee:** archeious **Milestone:** V1 Ship
Author
Collaborator

Contract Revision (2026-04-08)

The research contract has been significantly revised based on architectural critique. Key changes:

New fields added to ResearchResult

  1. raw_excerpt on citations — verbatim text from source, prevents synthesis paradox (double-summarization through LLM layers)
  2. discovery_events[] — lateral findings for other researchers (logged in V1, auto-dispatched in V2)
  3. confidence_factors — exposes inputs to confidence scoring (num sources, authority, contradictions, specificity, recency) for future calibration
  4. Categorized gapsGapCategory enum replaces free-text reasons:
    • SOURCE_NOT_FOUND — info doesn't exist in this domain
    • ACCESS_DENIED — paywall, robots.txt, auth wall
    • BUDGET_EXHAUSTED — hit iteration/token cap
    • CONTRADICTORY_SOURCES — sources disagree, unresolvable
    • SCOPE_EXCEEDED — needs a different researcher type
  5. content_hash in trace entries — SHA-256 of fetched content for pseudo-CAS change detection

Known Limitations documented

  • Confidence is LLM-generated, not calibrated (calibrate after 20-30 queries)
  • No citation validation (V2: validator node)
  • Traces are audit logs, not true replays (V2: CAS)
  • Discovery events logged only (V2: PI auto-dispatch)
  • No streaming progress (MCP is request-response)

Updated ship checklist

  • Repo structure set up
  • MCP server with research() tool
  • Internal agent loop (plan → search → fetch → iterate → synthesize)
  • Token/iteration budget enforcement
  • JSONL trace logging with content hashes
  • raw_excerpt on all citations
  • Categorized gaps (GapCategory enum)
  • Discovery events capture
  • Confidence factors reporting
  • CLI shim (ask, replay)
  • Contract documented in wiki
  • Integration test: structured answer with citations, gaps, discoveries
  • All tests passing

Full spec: ResearchContract wiki page

## Contract Revision (2026-04-08) The research contract has been significantly revised based on architectural critique. Key changes: ### New fields added to `ResearchResult` 1. **`raw_excerpt`** on citations — verbatim text from source, prevents synthesis paradox (double-summarization through LLM layers) 2. **`discovery_events[]`** — lateral findings for other researchers (logged in V1, auto-dispatched in V2) 3. **`confidence_factors`** — exposes inputs to confidence scoring (num sources, authority, contradictions, specificity, recency) for future calibration 4. **Categorized gaps** — `GapCategory` enum replaces free-text reasons: - `SOURCE_NOT_FOUND` — info doesn't exist in this domain - `ACCESS_DENIED` — paywall, robots.txt, auth wall - `BUDGET_EXHAUSTED` — hit iteration/token cap - `CONTRADICTORY_SOURCES` — sources disagree, unresolvable - `SCOPE_EXCEEDED` — needs a different researcher type 5. **`content_hash`** in trace entries — SHA-256 of fetched content for pseudo-CAS change detection ### Known Limitations documented - Confidence is LLM-generated, not calibrated (calibrate after 20-30 queries) - No citation validation (V2: validator node) - Traces are audit logs, not true replays (V2: CAS) - Discovery events logged only (V2: PI auto-dispatch) - No streaming progress (MCP is request-response) ### Updated ship checklist - [ ] Repo structure set up - [ ] MCP server with `research()` tool - [ ] Internal agent loop (plan → search → fetch → iterate → synthesize) - [ ] Token/iteration budget enforcement - [ ] JSONL trace logging with content hashes - [ ] `raw_excerpt` on all citations - [ ] Categorized gaps (GapCategory enum) - [ ] Discovery events capture - [ ] Confidence factors reporting - [ ] CLI shim (ask, replay) - [ ] Contract documented in wiki ✅ - [ ] Integration test: structured answer with citations, gaps, discoveries - [ ] All tests passing Full spec: [ResearchContract wiki page](https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden/wiki/ResearchContract)
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: archeious/marchwarden#1
No description provided.