2026-04-08 20:41:13 +00:00
|
|
|
"""MCP server for the web researcher.
|
|
|
|
|
|
|
|
|
|
Exposes a single tool `research` that delegates to WebResearcher.
|
|
|
|
|
Run with: python -m researchers.web.server
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
import asyncio
|
|
|
|
|
import os
|
|
|
|
|
import sys
|
|
|
|
|
from typing import Optional
|
|
|
|
|
|
|
|
|
|
from mcp.server.fastmcp import FastMCP
|
|
|
|
|
|
M2.5.1: Structured application logger via structlog (#24)
Adds an operational logging layer separate from the JSONL trace
audit logs. Operational logs cover system events (startup, errors,
MCP transport, research lifecycle); JSONL traces remain the
researcher provenance audit trail.
Backend: structlog with two renderers selectable via
MARCHWARDEN_LOG_FORMAT (json|console). Defaults to console when
stderr is a TTY, json otherwise — so dev runs are human-readable
and shipped runs (containers, automation) emit OpenSearch-ready
JSON without configuration.
Key features:
- Named loggers per component: marchwarden.cli,
marchwarden.mcp, marchwarden.researcher.web
- MARCHWARDEN_LOG_LEVEL controls global level (default INFO)
- MARCHWARDEN_LOG_FILE=1 enables a 10MB-rotating file at
~/.marchwarden/logs/marchwarden.log
- structlog contextvars bind trace_id + researcher at the start
of each research() call so every downstream log line carries
them automatically; cleared on completion
- stdlib logging is funneled through the same pipeline so noisy
third-party loggers (httpx, anthropic) get the same formatting
and quieted to WARN unless DEBUG is requested
- Logs to stderr to keep MCP stdio stdout clean
Wired into:
- cli.main.cli — configures logging on startup, logs ask_started/
ask_completed/ask_failed
- researchers.web.server.main — configures logging on startup,
logs mcp_server_starting
- researchers.web.agent.research — binds trace context, logs
research_started/research_completed
Tests verify JSON and console formats, contextvar propagation,
level filtering, idempotency, and auto-configure-on-first-use.
94/94 tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:46:51 +00:00
|
|
|
from obs import configure_logging, get_logger
|
2026-04-08 20:41:13 +00:00
|
|
|
from researchers.web.agent import WebResearcher
|
depth flag now drives constraint defaults (#30)
Previously the depth parameter (shallow/balanced/deep) was passed
only as a text hint inside the agent's user message, with no
mechanical effect on iterations, token budget, or source count.
The flag was effectively cosmetic — the LLM was expected to
"interpret" it.
Add DEPTH_PRESETS table and constraints_for_depth() helper in
researchers.web.models:
shallow: 2 iters, 5,000 tokens, 5 sources
balanced: 5 iters, 20,000 tokens, 10 sources (= historical defaults)
deep: 8 iters, 60,000 tokens, 20 sources
Wired through the stack:
- WebResearcher.research(): when constraints is None, builds from
the depth preset instead of bare ResearchConstraints()
- MCP server `research` tool: max_iterations and token_budget now
default to None; constraints are built via constraints_for_depth
with explicit values overriding the preset
- CLI `ask` command: --max-iterations and --budget default to None;
the CLI only forwards them to the MCP tool when set, so unset
flags fall through to the depth preset
balanced is unchanged from the historical defaults so existing
callers see no behavior difference. Explicit --max-iterations /
--budget always win over the preset.
Tests cover each preset's values, balanced backward-compat,
unknown depth fallback, full override, and partial override.
116/116 tests passing. Live-verified: --depth shallow on a simple
question now caps at 2 iterations and stays under budget.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:27:38 +00:00
|
|
|
from researchers.web.models import constraints_for_depth
|
2026-04-08 20:41:13 +00:00
|
|
|
|
M2.5.1: Structured application logger via structlog (#24)
Adds an operational logging layer separate from the JSONL trace
audit logs. Operational logs cover system events (startup, errors,
MCP transport, research lifecycle); JSONL traces remain the
researcher provenance audit trail.
Backend: structlog with two renderers selectable via
MARCHWARDEN_LOG_FORMAT (json|console). Defaults to console when
stderr is a TTY, json otherwise — so dev runs are human-readable
and shipped runs (containers, automation) emit OpenSearch-ready
JSON without configuration.
Key features:
- Named loggers per component: marchwarden.cli,
marchwarden.mcp, marchwarden.researcher.web
- MARCHWARDEN_LOG_LEVEL controls global level (default INFO)
- MARCHWARDEN_LOG_FILE=1 enables a 10MB-rotating file at
~/.marchwarden/logs/marchwarden.log
- structlog contextvars bind trace_id + researcher at the start
of each research() call so every downstream log line carries
them automatically; cleared on completion
- stdlib logging is funneled through the same pipeline so noisy
third-party loggers (httpx, anthropic) get the same formatting
and quieted to WARN unless DEBUG is requested
- Logs to stderr to keep MCP stdio stdout clean
Wired into:
- cli.main.cli — configures logging on startup, logs ask_started/
ask_completed/ask_failed
- researchers.web.server.main — configures logging on startup,
logs mcp_server_starting
- researchers.web.agent.research — binds trace context, logs
research_started/research_completed
Tests verify JSON and console formats, contextvar propagation,
level filtering, idempotency, and auto-configure-on-first-use.
94/94 tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:46:51 +00:00
|
|
|
log = get_logger("marchwarden.mcp")
|
|
|
|
|
|
2026-04-08 20:41:13 +00:00
|
|
|
mcp = FastMCP(
|
|
|
|
|
name="marchwarden-web-researcher",
|
|
|
|
|
instructions=(
|
|
|
|
|
"A Marchwarden web research specialist. "
|
|
|
|
|
"Call the research tool with a question to get a grounded, "
|
|
|
|
|
"evidence-based answer with citations, gaps, open questions, "
|
|
|
|
|
"and confidence scoring."
|
|
|
|
|
),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _read_secret(key: str) -> str:
|
|
|
|
|
"""Read a secret from ~/secrets file."""
|
|
|
|
|
secrets_path = os.path.expanduser("~/secrets")
|
|
|
|
|
with open(secrets_path) as f:
|
|
|
|
|
for line in f:
|
|
|
|
|
if line.startswith(f"{key}="):
|
|
|
|
|
return line.split("=", 1)[1].strip()
|
|
|
|
|
raise ValueError(f"Key {key} not found in {secrets_path}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _get_researcher() -> WebResearcher:
|
|
|
|
|
"""Create a WebResearcher with keys from ~/secrets."""
|
|
|
|
|
return WebResearcher(
|
|
|
|
|
anthropic_api_key=_read_secret("ANTHROPIC_API_KEY"),
|
|
|
|
|
tavily_api_key=_read_secret("TAVILY_API_KEY"),
|
2026-04-08 21:25:19 +00:00
|
|
|
model_id=os.environ.get("MARCHWARDEN_MODEL", "claude-sonnet-4-6"),
|
2026-04-08 20:41:13 +00:00
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@mcp.tool()
|
|
|
|
|
async def research(
|
|
|
|
|
question: str,
|
|
|
|
|
context: Optional[str] = None,
|
|
|
|
|
depth: str = "balanced",
|
depth flag now drives constraint defaults (#30)
Previously the depth parameter (shallow/balanced/deep) was passed
only as a text hint inside the agent's user message, with no
mechanical effect on iterations, token budget, or source count.
The flag was effectively cosmetic — the LLM was expected to
"interpret" it.
Add DEPTH_PRESETS table and constraints_for_depth() helper in
researchers.web.models:
shallow: 2 iters, 5,000 tokens, 5 sources
balanced: 5 iters, 20,000 tokens, 10 sources (= historical defaults)
deep: 8 iters, 60,000 tokens, 20 sources
Wired through the stack:
- WebResearcher.research(): when constraints is None, builds from
the depth preset instead of bare ResearchConstraints()
- MCP server `research` tool: max_iterations and token_budget now
default to None; constraints are built via constraints_for_depth
with explicit values overriding the preset
- CLI `ask` command: --max-iterations and --budget default to None;
the CLI only forwards them to the MCP tool when set, so unset
flags fall through to the depth preset
balanced is unchanged from the historical defaults so existing
callers see no behavior difference. Explicit --max-iterations /
--budget always win over the preset.
Tests cover each preset's values, balanced backward-compat,
unknown depth fallback, full override, and partial override.
116/116 tests passing. Live-verified: --depth shallow on a simple
question now caps at 2 iterations and stays under budget.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:27:38 +00:00
|
|
|
max_iterations: Optional[int] = None,
|
|
|
|
|
token_budget: Optional[int] = None,
|
2026-04-08 20:41:13 +00:00
|
|
|
) -> str:
|
|
|
|
|
"""Research a question using web search and return a structured answer.
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
question: The question to investigate.
|
|
|
|
|
context: What the caller already knows (optional).
|
depth flag now drives constraint defaults (#30)
Previously the depth parameter (shallow/balanced/deep) was passed
only as a text hint inside the agent's user message, with no
mechanical effect on iterations, token budget, or source count.
The flag was effectively cosmetic — the LLM was expected to
"interpret" it.
Add DEPTH_PRESETS table and constraints_for_depth() helper in
researchers.web.models:
shallow: 2 iters, 5,000 tokens, 5 sources
balanced: 5 iters, 20,000 tokens, 10 sources (= historical defaults)
deep: 8 iters, 60,000 tokens, 20 sources
Wired through the stack:
- WebResearcher.research(): when constraints is None, builds from
the depth preset instead of bare ResearchConstraints()
- MCP server `research` tool: max_iterations and token_budget now
default to None; constraints are built via constraints_for_depth
with explicit values overriding the preset
- CLI `ask` command: --max-iterations and --budget default to None;
the CLI only forwards them to the MCP tool when set, so unset
flags fall through to the depth preset
balanced is unchanged from the historical defaults so existing
callers see no behavior difference. Explicit --max-iterations /
--budget always win over the preset.
Tests cover each preset's values, balanced backward-compat,
unknown depth fallback, full override, and partial override.
116/116 tests passing. Live-verified: --depth shallow on a simple
question now caps at 2 iterations and stays under budget.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:27:38 +00:00
|
|
|
depth: Research depth — "shallow", "balanced", or "deep". Each
|
|
|
|
|
depth picks default max_iterations / token_budget / max_sources.
|
|
|
|
|
max_iterations: Override the depth preset for iterations (1-20).
|
|
|
|
|
token_budget: Override the depth preset for token budget.
|
2026-04-08 20:41:13 +00:00
|
|
|
|
|
|
|
|
Returns:
|
|
|
|
|
JSON string containing the full ResearchResult with answer,
|
|
|
|
|
citations, gaps, discovery_events, open_questions, confidence,
|
|
|
|
|
and cost_metadata.
|
|
|
|
|
"""
|
|
|
|
|
researcher = _get_researcher()
|
depth flag now drives constraint defaults (#30)
Previously the depth parameter (shallow/balanced/deep) was passed
only as a text hint inside the agent's user message, with no
mechanical effect on iterations, token budget, or source count.
The flag was effectively cosmetic — the LLM was expected to
"interpret" it.
Add DEPTH_PRESETS table and constraints_for_depth() helper in
researchers.web.models:
shallow: 2 iters, 5,000 tokens, 5 sources
balanced: 5 iters, 20,000 tokens, 10 sources (= historical defaults)
deep: 8 iters, 60,000 tokens, 20 sources
Wired through the stack:
- WebResearcher.research(): when constraints is None, builds from
the depth preset instead of bare ResearchConstraints()
- MCP server `research` tool: max_iterations and token_budget now
default to None; constraints are built via constraints_for_depth
with explicit values overriding the preset
- CLI `ask` command: --max-iterations and --budget default to None;
the CLI only forwards them to the MCP tool when set, so unset
flags fall through to the depth preset
balanced is unchanged from the historical defaults so existing
callers see no behavior difference. Explicit --max-iterations /
--budget always win over the preset.
Tests cover each preset's values, balanced backward-compat,
unknown depth fallback, full override, and partial override.
116/116 tests passing. Live-verified: --depth shallow on a simple
question now caps at 2 iterations and stays under budget.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:27:38 +00:00
|
|
|
constraints = constraints_for_depth(
|
|
|
|
|
depth,
|
2026-04-08 20:41:13 +00:00
|
|
|
max_iterations=max_iterations,
|
|
|
|
|
token_budget=token_budget,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
result = await researcher.research(
|
|
|
|
|
question=question,
|
|
|
|
|
context=context,
|
|
|
|
|
depth=depth,
|
|
|
|
|
constraints=constraints,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
return result.model_dump_json(indent=2)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def main():
|
|
|
|
|
"""Run the MCP server on stdio."""
|
M2.5.1: Structured application logger via structlog (#24)
Adds an operational logging layer separate from the JSONL trace
audit logs. Operational logs cover system events (startup, errors,
MCP transport, research lifecycle); JSONL traces remain the
researcher provenance audit trail.
Backend: structlog with two renderers selectable via
MARCHWARDEN_LOG_FORMAT (json|console). Defaults to console when
stderr is a TTY, json otherwise — so dev runs are human-readable
and shipped runs (containers, automation) emit OpenSearch-ready
JSON without configuration.
Key features:
- Named loggers per component: marchwarden.cli,
marchwarden.mcp, marchwarden.researcher.web
- MARCHWARDEN_LOG_LEVEL controls global level (default INFO)
- MARCHWARDEN_LOG_FILE=1 enables a 10MB-rotating file at
~/.marchwarden/logs/marchwarden.log
- structlog contextvars bind trace_id + researcher at the start
of each research() call so every downstream log line carries
them automatically; cleared on completion
- stdlib logging is funneled through the same pipeline so noisy
third-party loggers (httpx, anthropic) get the same formatting
and quieted to WARN unless DEBUG is requested
- Logs to stderr to keep MCP stdio stdout clean
Wired into:
- cli.main.cli — configures logging on startup, logs ask_started/
ask_completed/ask_failed
- researchers.web.server.main — configures logging on startup,
logs mcp_server_starting
- researchers.web.agent.research — binds trace context, logs
research_started/research_completed
Tests verify JSON and console formats, contextvar propagation,
level filtering, idempotency, and auto-configure-on-first-use.
94/94 tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:46:51 +00:00
|
|
|
configure_logging()
|
|
|
|
|
log.info("mcp_server_starting", transport="stdio", server="marchwarden-web-researcher")
|
2026-04-08 20:41:13 +00:00
|
|
|
mcp.run(transport="stdio")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
if __name__ == "__main__":
|
|
|
|
|
main()
|