marchwarden

Author	SHA1	Message	Date
Jeff Smith	6ff1a6af3d	Enforce token_budget before each iteration (#17 ) The loop previously checked the token budget at the bottom of each iteration, after the LLM call and tool work had already happened. By the time the cap was caught the budget had been exceeded and the overshoot was unbounded by the iteration's cost. Move the check to the top of the loop so a new iteration is never started past the budget. Document the policy explicitly: token_budget is a soft cap on the tool-use loop only; the synthesis call is always allowed to complete so callers get a structured ResearchResult rather than a fallback stub. Capping synthesis is a separate, larger design question (would require splitting the budget between loop and synthesis up-front). Verified: token_budget=5000, max_iterations=10 now stops after 2 iterations with budget_exhausted=True and a complete answer with 10 citations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:29:22 -06:00
archeious	50d59abf52	Merge pull request 'Fix invalid default model id' (#21 ) from fix/model-default-id into main Reviewed-on: #21 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 21:26:05 +00:00
Jeff Smith	eb2e71835c	Fix invalid default model id (#15 ) Both the MCP server and WebResearcher defaulted to claude-sonnet-4-5-20250514, which 404s against the Anthropic API. Update both defaults to claude-sonnet-4-6, which is current as of 2026-04. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:25:19 -06:00
archeious	c19a161a62	Merge pull request 'Fix synthesis truncation and trace masking' (#20 ) from fix/synthesis-truncation into main Reviewed-on: #20 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 21:24:41 +00:00
Jeff Smith	7956bf4873	Fix synthesis truncation and trace masking (#16 , #19 ) The synthesis step was passing max_tokens=4096 to Claude, which was not enough for a full ResearchResult JSON over a real evidence set (28 sources). The model's output got cut mid-string, json.loads failed, and the agent fell back to a stub answer with zero citations. The trace logger then truncated the raw_response to 1000 chars before recording it, hiding the actual reason for the parse failure (the truncated JSON suffix) and making the bug invisible from traces. Fixes: - Bump synthesis max_tokens to 16384 - Capture and log Claude's stop_reason on synthesis_error so future truncation cases are diagnosable from the trace alone - Log the parser exception text alongside the raw_response - Stop slicing raw_response — record the full string Verified end-to-end against the Utah crops question: - Before: 0 citations, confidence 0.10, fallback stub - After: 9 citations, confidence 0.88, real synthesized answer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:23:03 -06:00
archeious	16d88e951b	Merge pull request 'chore: docker-based test environment' (#14 ) from chore/docker-test-env into main Reviewed-on: #14 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 21:08:27 +00:00
Jeff Smith	40d0725497	chore: add docker-based test environment (#13 ) Reproducible Python 3.12-slim container that installs the project editable with dev deps. Adds pytest-asyncio to dev deps so async tests run cleanly inside the container (host had it installed out-of-band). scripts/docker-test.sh provides build, test, ask, replay, and shell subcommands. The ask/replay/shell commands mount ~/secrets read-only and ~/.marchwarden read-write so end-to-end runs persist traces back to the host. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:06:12 -06:00
archeious	bca7294ec8	Merge pull request 'M2.2: marchwarden replay CLI command' (#12 ) from feat/cli-replay into main Reviewed-on: #12 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 20:59:12 +00:00
Jeff Smith	273d144381	M2.2: marchwarden replay CLI command (#9 ) Adds `marchwarden replay <trace_id>` to pretty-print a prior research run from its JSONL trace file. Resolves the trace under ~/.marchwarden/traces/ by default; --trace-dir overrides for tests and custom locations. Renders each step as a row with action, decision, extra fields, and content_hash. Friendly errors for unknown trace_id and malformed JSON lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 14:57:37 -06:00
archeious	b2b7026eb2	Merge pull request 'M2.1: marchwarden ask CLI command' (#11 ) from feat/cli-ask into main Reviewed-on: #11 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 20:54:59 +00:00
Jeff Smith	87a34c60d1	M2.1: marchwarden ask CLI command (#8 ) Click app with `ask` subcommand that spawns the web researcher MCP server over stdio, calls the research tool, and pretty-prints the ResearchResult contract using rich (panels for answer/confidence/cost, tables for citations, gaps, discovery events, and open questions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 14:51:40 -06:00
Jeff Smith	166d86e190	chore: add CLAUDE.md for session 1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:44:16 -06:00
archeious	7088f45f06	Merge pull request 'M1.4: MCP server' (#7 ) from feat/mcp-server into main	2026-04-08 20:41:28 +00:00
Jeff Smith	5d894d9e10	M1.4: MCP server wrapping web researcher FastMCP server exposing a single 'research' tool: - Delegates to WebResearcher with keys from ~/secrets - Accepts question, context, depth, max_iterations, token_budget - Returns full ResearchResult as JSON - Configurable model via MARCHWARDEN_MODEL env var - Runnable as: python -m researchers.web 4 tests: secret reading, JSON response validation, default parameters. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:41:13 -06:00
archeious	f593dd060b	Merge pull request 'Add OpenQuestion to research contract' (#6 ) from feat/open-questions into main	2026-04-08 20:37:54 +00:00
Jeff Smith	ae9c11a79b	Add OpenQuestion to research contract New field on ResearchResult: open_questions — follow-up questions that emerged from the research itself. Distinct from gaps (backward: what failed) and discovery_events (sideways: what's lateral). Open questions look forward: 'based on what I found, this needs deeper investigation.' - OpenQuestion model: question, context, priority (high/medium/low), source_locator - Updated agent synthesis prompt to produce open_questions - Updated agent result builder to parse open_questions from JSON - 3 new tests for OpenQuestion model - Updated existing tests for new field 77 tests passing. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:37:30 -06:00
archeious	ece2455415	Merge pull request 'M1.3: Inner agent loop' (#5 ) from feat/agent-loop into main	2026-04-08 20:29:41 +00:00
Jeff Smith	7cb3fde90e	M1.3: Inner agent loop with tests WebResearcher — the core agentic research loop: - Tool-use loop: Claude decides when to search (Tavily) and fetch (httpx) - Budget enforcement: stops at max_iterations or token_budget - Synthesis step: separate LLM call produces structured ResearchResult JSON - Fallback: valid ResearchResult even when synthesis JSON is unparseable - Full trace logging at every step (start, search, fetch, synthesis, complete) - Populates all contract fields: raw_excerpt, categorized gaps, discovery_events, confidence_factors, cost_metadata with model_id 9 tests: complete research loop, budget exhaustion, synthesis failure fallback, trace file creation, fetch_url tool integration, search result formatting. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:29:27 -06:00
archeious	21c8191b81	Merge pull request 'M1.2: Trace logger' (#4 ) from feat/trace-logger into main	2026-04-08 20:25:58 +00:00
Jeff Smith	cef08c8984	M1.2: Trace logger with tests TraceLogger produces JSONL audit logs per research() call: - One file per trace_id at ~/.marchwarden/traces/{trace_id}.jsonl - Each line is a self-contained JSON object (step, action, timestamp, decision) - Supports arbitrary kwargs (url, content_hash, query, etc.) - Lazy file handle, flush after each write, context manager support - read_entries() for replay and testing 15 tests: file creation, step counting, JSONL validity, kwargs, timestamps, flush behavior, multiple independent traces. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:21:10 -06:00
archeious	851fed6a5f	Merge pull request 'M1.1: Search and fetch tools' (#3 ) from feat/search-fetch-tools into main	2026-04-08 20:19:21 +00:00
Jeff Smith	a5bc93e275	M1.1: Search and fetch tools with tests - tavily_search(): Tavily API wrapper returning SearchResult dataclasses with content hashing (raw_content preferred, falls back to summary) - fetch_url(): async URL fetch with HTML text extraction, content hashing, and graceful error handling (timeout, HTTP errors, connection errors) - _extract_text(): simple HTML → clean text (strip scripts/styles/tags, decode entities, collapse whitespace) - _sha256(): SHA-256 content hashing with 'sha256:' prefix for traces 18 tests: hashing, HTML extraction, mocked Tavily search, mocked async fetch (success, timeout, HTTP error, hash consistency). Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:17:18 -06:00
archeious	8930f4486a	Merge pull request 'M0.3: Contract v1 Pydantic models' (#2 ) from feat/contract-models into main	2026-04-08 20:14:45 +00:00
Jeff Smith	1b0f86399a	M0.3: Implement contract v1 Pydantic models with tests All Research Contract types as Pydantic models: - ResearchConstraints (input) - Citation with raw_excerpt (output) - GapCategory enum (5 categories) - Gap with structured category (output) - DiscoveryEvent (lateral findings) - ConfidenceFactors (auditable scoring inputs) - CostMetadata with model_id (resource tracking) - ResearchResult (top-level contract) 32 tests: validation, bounds checking, serialization roundtrips, JSON structure verification against contract spec. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:00:45 -06:00
Jeff Smith	6a8445ed13	Fix README wiki links to use absolute URLs Relative /wiki/ paths resolve against the Forgejo root, not the repo. Use full URLs so links work from the repo README page. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 13:41:03 -06:00
Jeff Smith	79becb21ec	Fix README: correct clone URL and wiki link paths - Update clone URL to archeious/marchwarden (was claude-code) - Fix wiki links to use /wiki/ routes instead of docs/wiki/ source paths - Fix issue link to correct repo Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 12:07:29 -06:00
Jeff Smith	deb124ed29	Initial project structure and scaffolding - Directory layout: researchers/web/, orchestrator/, cli/, docs/wiki/ - README with quick start and vision - CONTRIBUTING with workflow and testing guidelines - pyproject.toml with dependencies and build config - .gitignore for Python projects Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 11:57:15 -06:00
claude-code	f1e27e35f0	Initial commit	2026-04-08 17:56:21 +00:00

28 commits