marchwarden

Author	SHA1	Message	Date
Jeff Smith	7956bf4873	Fix synthesis truncation and trace masking (#16 , #19 ) The synthesis step was passing max_tokens=4096 to Claude, which was not enough for a full ResearchResult JSON over a real evidence set (28 sources). The model's output got cut mid-string, json.loads failed, and the agent fell back to a stub answer with zero citations. The trace logger then truncated the raw_response to 1000 chars before recording it, hiding the actual reason for the parse failure (the truncated JSON suffix) and making the bug invisible from traces. Fixes: - Bump synthesis max_tokens to 16384 - Capture and log Claude's stop_reason on synthesis_error so future truncation cases are diagnosable from the trace alone - Log the parser exception text alongside the raw_response - Stop slicing raw_response — record the full string Verified end-to-end against the Utah crops question: - Before: 0 citations, confidence 0.10, fallback stub - After: 9 citations, confidence 0.88, real synthesized answer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:23:03 -06:00
archeious	16d88e951b	Merge pull request 'chore: docker-based test environment' (#14 ) from chore/docker-test-env into main Reviewed-on: #14 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 21:08:27 +00:00
Jeff Smith	40d0725497	chore: add docker-based test environment (#13 ) Reproducible Python 3.12-slim container that installs the project editable with dev deps. Adds pytest-asyncio to dev deps so async tests run cleanly inside the container (host had it installed out-of-band). scripts/docker-test.sh provides build, test, ask, replay, and shell subcommands. The ask/replay/shell commands mount ~/secrets read-only and ~/.marchwarden read-write so end-to-end runs persist traces back to the host. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:06:12 -06:00
archeious	bca7294ec8	Merge pull request 'M2.2: marchwarden replay CLI command' (#12 ) from feat/cli-replay into main Reviewed-on: #12 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 20:59:12 +00:00
Jeff Smith	273d144381	M2.2: marchwarden replay CLI command (#9 ) Adds `marchwarden replay <trace_id>` to pretty-print a prior research run from its JSONL trace file. Resolves the trace under ~/.marchwarden/traces/ by default; --trace-dir overrides for tests and custom locations. Renders each step as a row with action, decision, extra fields, and content_hash. Friendly errors for unknown trace_id and malformed JSON lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 14:57:37 -06:00
archeious	b2b7026eb2	Merge pull request 'M2.1: marchwarden ask CLI command' (#11 ) from feat/cli-ask into main Reviewed-on: #11 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 20:54:59 +00:00
Jeff Smith	87a34c60d1	M2.1: marchwarden ask CLI command (#8 ) Click app with `ask` subcommand that spawns the web researcher MCP server over stdio, calls the research tool, and pretty-prints the ResearchResult contract using rich (panels for answer/confidence/cost, tables for citations, gaps, discovery events, and open questions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 14:51:40 -06:00
Jeff Smith	166d86e190	chore: add CLAUDE.md for session 1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:44:16 -06:00
archeious	7088f45f06	Merge pull request 'M1.4: MCP server' (#7 ) from feat/mcp-server into main	2026-04-08 20:41:28 +00:00
Jeff Smith	5d894d9e10	M1.4: MCP server wrapping web researcher FastMCP server exposing a single 'research' tool: - Delegates to WebResearcher with keys from ~/secrets - Accepts question, context, depth, max_iterations, token_budget - Returns full ResearchResult as JSON - Configurable model via MARCHWARDEN_MODEL env var - Runnable as: python -m researchers.web 4 tests: secret reading, JSON response validation, default parameters. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:41:13 -06:00
archeious	f593dd060b	Merge pull request 'Add OpenQuestion to research contract' (#6 ) from feat/open-questions into main	2026-04-08 20:37:54 +00:00
Jeff Smith	ae9c11a79b	Add OpenQuestion to research contract New field on ResearchResult: open_questions — follow-up questions that emerged from the research itself. Distinct from gaps (backward: what failed) and discovery_events (sideways: what's lateral). Open questions look forward: 'based on what I found, this needs deeper investigation.' - OpenQuestion model: question, context, priority (high/medium/low), source_locator - Updated agent synthesis prompt to produce open_questions - Updated agent result builder to parse open_questions from JSON - 3 new tests for OpenQuestion model - Updated existing tests for new field 77 tests passing. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:37:30 -06:00
archeious	ece2455415	Merge pull request 'M1.3: Inner agent loop' (#5 ) from feat/agent-loop into main	2026-04-08 20:29:41 +00:00
Jeff Smith	7cb3fde90e	M1.3: Inner agent loop with tests WebResearcher — the core agentic research loop: - Tool-use loop: Claude decides when to search (Tavily) and fetch (httpx) - Budget enforcement: stops at max_iterations or token_budget - Synthesis step: separate LLM call produces structured ResearchResult JSON - Fallback: valid ResearchResult even when synthesis JSON is unparseable - Full trace logging at every step (start, search, fetch, synthesis, complete) - Populates all contract fields: raw_excerpt, categorized gaps, discovery_events, confidence_factors, cost_metadata with model_id 9 tests: complete research loop, budget exhaustion, synthesis failure fallback, trace file creation, fetch_url tool integration, search result formatting. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:29:27 -06:00
archeious	21c8191b81	Merge pull request 'M1.2: Trace logger' (#4 ) from feat/trace-logger into main	2026-04-08 20:25:58 +00:00
Jeff Smith	cef08c8984	M1.2: Trace logger with tests TraceLogger produces JSONL audit logs per research() call: - One file per trace_id at ~/.marchwarden/traces/{trace_id}.jsonl - Each line is a self-contained JSON object (step, action, timestamp, decision) - Supports arbitrary kwargs (url, content_hash, query, etc.) - Lazy file handle, flush after each write, context manager support - read_entries() for replay and testing 15 tests: file creation, step counting, JSONL validity, kwargs, timestamps, flush behavior, multiple independent traces. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:21:10 -06:00
archeious	851fed6a5f	Merge pull request 'M1.1: Search and fetch tools' (#3 ) from feat/search-fetch-tools into main	2026-04-08 20:19:21 +00:00
Jeff Smith	a5bc93e275	M1.1: Search and fetch tools with tests - tavily_search(): Tavily API wrapper returning SearchResult dataclasses with content hashing (raw_content preferred, falls back to summary) - fetch_url(): async URL fetch with HTML text extraction, content hashing, and graceful error handling (timeout, HTTP errors, connection errors) - _extract_text(): simple HTML → clean text (strip scripts/styles/tags, decode entities, collapse whitespace) - _sha256(): SHA-256 content hashing with 'sha256:' prefix for traces 18 tests: hashing, HTML extraction, mocked Tavily search, mocked async fetch (success, timeout, HTTP error, hash consistency). Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:17:18 -06:00
archeious	8930f4486a	Merge pull request 'M0.3: Contract v1 Pydantic models' (#2 ) from feat/contract-models into main	2026-04-08 20:14:45 +00:00
Jeff Smith	1b0f86399a	M0.3: Implement contract v1 Pydantic models with tests All Research Contract types as Pydantic models: - ResearchConstraints (input) - Citation with raw_excerpt (output) - GapCategory enum (5 categories) - Gap with structured category (output) - DiscoveryEvent (lateral findings) - ConfidenceFactors (auditable scoring inputs) - CostMetadata with model_id (resource tracking) - ResearchResult (top-level contract) 32 tests: validation, bounds checking, serialization roundtrips, JSON structure verification against contract spec. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:00:45 -06:00
Jeff Smith	6a8445ed13	Fix README wiki links to use absolute URLs Relative /wiki/ paths resolve against the Forgejo root, not the repo. Use full URLs so links work from the repo README page. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 13:41:03 -06:00
Jeff Smith	79becb21ec	Fix README: correct clone URL and wiki link paths - Update clone URL to archeious/marchwarden (was claude-code) - Fix wiki links to use /wiki/ routes instead of docs/wiki/ source paths - Fix issue link to correct repo Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 12:07:29 -06:00
Jeff Smith	deb124ed29	Initial project structure and scaffolding - Directory layout: researchers/web/, orchestrator/, cli/, docs/wiki/ - README with quick start and vision - CONTRIBUTING with workflow and testing guidelines - pyproject.toml with dependencies and build config - .gitignore for Python projects Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 11:57:15 -06:00
claude-code	f1e27e35f0	Initial commit	2026-04-08 17:56:21 +00:00

24 commits