marchwarden/researchers/web/server.py

97 lines
2.7 KiB
Python
Raw Permalink Normal View History

"""MCP server for the web researcher.
Exposes a single tool `research` that delegates to WebResearcher.
Run with: python -m researchers.web.server
"""
import asyncio
import os
import sys
from typing import Optional
from mcp.server.fastmcp import FastMCP
M2.5.1: Structured application logger via structlog (#24) Adds an operational logging layer separate from the JSONL trace audit logs. Operational logs cover system events (startup, errors, MCP transport, research lifecycle); JSONL traces remain the researcher provenance audit trail. Backend: structlog with two renderers selectable via MARCHWARDEN_LOG_FORMAT (json|console). Defaults to console when stderr is a TTY, json otherwise — so dev runs are human-readable and shipped runs (containers, automation) emit OpenSearch-ready JSON without configuration. Key features: - Named loggers per component: marchwarden.cli, marchwarden.mcp, marchwarden.researcher.web - MARCHWARDEN_LOG_LEVEL controls global level (default INFO) - MARCHWARDEN_LOG_FILE=1 enables a 10MB-rotating file at ~/.marchwarden/logs/marchwarden.log - structlog contextvars bind trace_id + researcher at the start of each research() call so every downstream log line carries them automatically; cleared on completion - stdlib logging is funneled through the same pipeline so noisy third-party loggers (httpx, anthropic) get the same formatting and quieted to WARN unless DEBUG is requested - Logs to stderr to keep MCP stdio stdout clean Wired into: - cli.main.cli — configures logging on startup, logs ask_started/ ask_completed/ask_failed - researchers.web.server.main — configures logging on startup, logs mcp_server_starting - researchers.web.agent.research — binds trace context, logs research_started/research_completed Tests verify JSON and console formats, contextvar propagation, level filtering, idempotency, and auto-configure-on-first-use. 94/94 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:46:51 +00:00
from obs import configure_logging, get_logger
from researchers.web.agent import WebResearcher
from researchers.web.models import ResearchConstraints
M2.5.1: Structured application logger via structlog (#24) Adds an operational logging layer separate from the JSONL trace audit logs. Operational logs cover system events (startup, errors, MCP transport, research lifecycle); JSONL traces remain the researcher provenance audit trail. Backend: structlog with two renderers selectable via MARCHWARDEN_LOG_FORMAT (json|console). Defaults to console when stderr is a TTY, json otherwise — so dev runs are human-readable and shipped runs (containers, automation) emit OpenSearch-ready JSON without configuration. Key features: - Named loggers per component: marchwarden.cli, marchwarden.mcp, marchwarden.researcher.web - MARCHWARDEN_LOG_LEVEL controls global level (default INFO) - MARCHWARDEN_LOG_FILE=1 enables a 10MB-rotating file at ~/.marchwarden/logs/marchwarden.log - structlog contextvars bind trace_id + researcher at the start of each research() call so every downstream log line carries them automatically; cleared on completion - stdlib logging is funneled through the same pipeline so noisy third-party loggers (httpx, anthropic) get the same formatting and quieted to WARN unless DEBUG is requested - Logs to stderr to keep MCP stdio stdout clean Wired into: - cli.main.cli — configures logging on startup, logs ask_started/ ask_completed/ask_failed - researchers.web.server.main — configures logging on startup, logs mcp_server_starting - researchers.web.agent.research — binds trace context, logs research_started/research_completed Tests verify JSON and console formats, contextvar propagation, level filtering, idempotency, and auto-configure-on-first-use. 94/94 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:46:51 +00:00
log = get_logger("marchwarden.mcp")
mcp = FastMCP(
name="marchwarden-web-researcher",
instructions=(
"A Marchwarden web research specialist. "
"Call the research tool with a question to get a grounded, "
"evidence-based answer with citations, gaps, open questions, "
"and confidence scoring."
),
)
def _read_secret(key: str) -> str:
"""Read a secret from ~/secrets file."""
secrets_path = os.path.expanduser("~/secrets")
with open(secrets_path) as f:
for line in f:
if line.startswith(f"{key}="):
return line.split("=", 1)[1].strip()
raise ValueError(f"Key {key} not found in {secrets_path}")
def _get_researcher() -> WebResearcher:
"""Create a WebResearcher with keys from ~/secrets."""
return WebResearcher(
anthropic_api_key=_read_secret("ANTHROPIC_API_KEY"),
tavily_api_key=_read_secret("TAVILY_API_KEY"),
model_id=os.environ.get("MARCHWARDEN_MODEL", "claude-sonnet-4-6"),
)
@mcp.tool()
async def research(
question: str,
context: Optional[str] = None,
depth: str = "balanced",
max_iterations: int = 5,
token_budget: int = 20000,
) -> str:
"""Research a question using web search and return a structured answer.
Args:
question: The question to investigate.
context: What the caller already knows (optional).
depth: Research depth "shallow", "balanced", or "deep".
max_iterations: Maximum number of search/fetch iterations (1-20).
token_budget: Maximum tokens to spend (1000-100000).
Returns:
JSON string containing the full ResearchResult with answer,
citations, gaps, discovery_events, open_questions, confidence,
and cost_metadata.
"""
researcher = _get_researcher()
constraints = ResearchConstraints(
max_iterations=max_iterations,
token_budget=token_budget,
)
result = await researcher.research(
question=question,
context=context,
depth=depth,
constraints=constraints,
)
return result.model_dump_json(indent=2)
def main():
"""Run the MCP server on stdio."""
M2.5.1: Structured application logger via structlog (#24) Adds an operational logging layer separate from the JSONL trace audit logs. Operational logs cover system events (startup, errors, MCP transport, research lifecycle); JSONL traces remain the researcher provenance audit trail. Backend: structlog with two renderers selectable via MARCHWARDEN_LOG_FORMAT (json|console). Defaults to console when stderr is a TTY, json otherwise — so dev runs are human-readable and shipped runs (containers, automation) emit OpenSearch-ready JSON without configuration. Key features: - Named loggers per component: marchwarden.cli, marchwarden.mcp, marchwarden.researcher.web - MARCHWARDEN_LOG_LEVEL controls global level (default INFO) - MARCHWARDEN_LOG_FILE=1 enables a 10MB-rotating file at ~/.marchwarden/logs/marchwarden.log - structlog contextvars bind trace_id + researcher at the start of each research() call so every downstream log line carries them automatically; cleared on completion - stdlib logging is funneled through the same pipeline so noisy third-party loggers (httpx, anthropic) get the same formatting and quieted to WARN unless DEBUG is requested - Logs to stderr to keep MCP stdio stdout clean Wired into: - cli.main.cli — configures logging on startup, logs ask_started/ ask_completed/ask_failed - researchers.web.server.main — configures logging on startup, logs mcp_server_starting - researchers.web.agent.research — binds trace context, logs research_started/research_completed Tests verify JSON and console formats, contextvar propagation, level filtering, idempotency, and auto-configure-on-first-use. 94/94 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:46:51 +00:00
configure_logging()
log.info("mcp_server_starting", transport="stdio", server="marchwarden-web-researcher")
mcp.run(transport="stdio")
if __name__ == "__main__":
main()