marchwarden

Author	SHA1	Message	Date
Jeff Smith	7956bf4873	Fix synthesis truncation and trace masking (#16 , #19 ) The synthesis step was passing max_tokens=4096 to Claude, which was not enough for a full ResearchResult JSON over a real evidence set (28 sources). The model's output got cut mid-string, json.loads failed, and the agent fell back to a stub answer with zero citations. The trace logger then truncated the raw_response to 1000 chars before recording it, hiding the actual reason for the parse failure (the truncated JSON suffix) and making the bug invisible from traces. Fixes: - Bump synthesis max_tokens to 16384 - Capture and log Claude's stop_reason on synthesis_error so future truncation cases are diagnosable from the trace alone - Log the parser exception text alongside the raw_response - Stop slicing raw_response — record the full string Verified end-to-end against the Utah crops question: - Before: 0 citations, confidence 0.10, fallback stub - After: 9 citations, confidence 0.88, real synthesized answer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:23:03 -06:00
Jeff Smith	5d894d9e10	M1.4: MCP server wrapping web researcher FastMCP server exposing a single 'research' tool: - Delegates to WebResearcher with keys from ~/secrets - Accepts question, context, depth, max_iterations, token_budget - Returns full ResearchResult as JSON - Configurable model via MARCHWARDEN_MODEL env var - Runnable as: python -m researchers.web 4 tests: secret reading, JSON response validation, default parameters. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:41:13 -06:00
Jeff Smith	ae9c11a79b	Add OpenQuestion to research contract New field on ResearchResult: open_questions — follow-up questions that emerged from the research itself. Distinct from gaps (backward: what failed) and discovery_events (sideways: what's lateral). Open questions look forward: 'based on what I found, this needs deeper investigation.' - OpenQuestion model: question, context, priority (high/medium/low), source_locator - Updated agent synthesis prompt to produce open_questions - Updated agent result builder to parse open_questions from JSON - 3 new tests for OpenQuestion model - Updated existing tests for new field 77 tests passing. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:37:30 -06:00
Jeff Smith	7cb3fde90e	M1.3: Inner agent loop with tests WebResearcher — the core agentic research loop: - Tool-use loop: Claude decides when to search (Tavily) and fetch (httpx) - Budget enforcement: stops at max_iterations or token_budget - Synthesis step: separate LLM call produces structured ResearchResult JSON - Fallback: valid ResearchResult even when synthesis JSON is unparseable - Full trace logging at every step (start, search, fetch, synthesis, complete) - Populates all contract fields: raw_excerpt, categorized gaps, discovery_events, confidence_factors, cost_metadata with model_id 9 tests: complete research loop, budget exhaustion, synthesis failure fallback, trace file creation, fetch_url tool integration, search result formatting. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:29:27 -06:00
Jeff Smith	cef08c8984	M1.2: Trace logger with tests TraceLogger produces JSONL audit logs per research() call: - One file per trace_id at ~/.marchwarden/traces/{trace_id}.jsonl - Each line is a self-contained JSON object (step, action, timestamp, decision) - Supports arbitrary kwargs (url, content_hash, query, etc.) - Lazy file handle, flush after each write, context manager support - read_entries() for replay and testing 15 tests: file creation, step counting, JSONL validity, kwargs, timestamps, flush behavior, multiple independent traces. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:21:10 -06:00
Jeff Smith	a5bc93e275	M1.1: Search and fetch tools with tests - tavily_search(): Tavily API wrapper returning SearchResult dataclasses with content hashing (raw_content preferred, falls back to summary) - fetch_url(): async URL fetch with HTML text extraction, content hashing, and graceful error handling (timeout, HTTP errors, connection errors) - _extract_text(): simple HTML → clean text (strip scripts/styles/tags, decode entities, collapse whitespace) - _sha256(): SHA-256 content hashing with 'sha256:' prefix for traces 18 tests: hashing, HTML extraction, mocked Tavily search, mocked async fetch (success, timeout, HTTP error, hash consistency). Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:17:18 -06:00
Jeff Smith	1b0f86399a	M0.3: Implement contract v1 Pydantic models with tests All Research Contract types as Pydantic models: - ResearchConstraints (input) - Citation with raw_excerpt (output) - GapCategory enum (5 categories) - Gap with structured category (output) - DiscoveryEvent (lateral findings) - ConfidenceFactors (auditable scoring inputs) - CostMetadata with model_id (resource tracking) - ResearchResult (top-level contract) 32 tests: validation, bounds checking, serialization roundtrips, JSON structure verification against contract spec. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:00:45 -06:00
Jeff Smith	deb124ed29	Initial project structure and scaffolding - Directory layout: researchers/web/, orchestrator/, cli/, docs/wiki/ - README with quick start and vision - CONTRIBUTING with workflow and testing guidelines - pyproject.toml with dependencies and build config - .gitignore for Python projects Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 11:57:15 -06:00

8 commits