marchwarden

Author	SHA1	Message	Date
Jeff Smith	c0d4f391b6	Display budget as spend status, not exhaustion alarm Replace 'Budget exhausted: True/False' with 'Budget status: spent / under cap' in the Confidence panel. The previous wording read as a failure indicator when in practice 'exhausted' just means the agent spent its tool-use cap before voluntarily stopping — the normal, expected outcome on real questions with the default 20k budget. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:12:39 -06:00
archeious	4816b9386e	Merge pull request 'M2.5.3: marchwarden costs CLI command' (#29 ) from feat/costs-command into main Reviewed-on: #29 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 21:59:07 +00:00
Jeff Smith	6fdf0e338a	M2.5.3: marchwarden costs CLI command (#26 ) Adds operator-facing `marchwarden costs` subcommand that reads the JSONL ledger from M2.5.2 and pretty-prints a rich summary: - Cost Summary panel: total calls, total spend, total tokens (input/ output split), Tavily search count, warning for any calls with unknown model prices - Per-Day table sorted by date - Per-Model table sorted by model id - Highest-Cost Call panel with trace_id and question Flags: --since ISO date or relative shorthand (7d, 24h, 2w, 1m) --until same --model filter to a specific model_id --json emit raw filtered ledger entries instead of the table --ledger override default path (mostly for tests) Also fixes a Dockerfile gap: the obs/ package added in M2.5.1 was not being COPYed into the image, so the installed `marchwarden` entry point couldn't import it. Tests had been passing because they mounted /app over the install. Adding `COPY obs ./obs` restores parity. Tests cover summary rendering, model filter, since-date filter, JSON output, and the empty-ledger friendly path. 110/110 passing. End-to-end verified against the real cost ledger. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:57:39 -06:00
archeious	5a0ca73e2a	Merge pull request 'M2.5.2: Cost ledger with price table' (#28 ) from feat/cost-ledger into main Reviewed-on: #28	2026-04-08 21:54:23 +00:00
Jeff Smith	0d957336f5	M2.5.2: Cost ledger with price table (#25 ) Adds an append-only JSONL ledger of every research() call at ~/.marchwarden/costs.jsonl, supplementing (not replacing) the per-call cost_metadata field returned to callers. The ledger is the operator-facing source of truth for spend tracking, queryable via the upcoming `marchwarden costs` command (M2.5.3). Fields per entry: timestamp, trace_id, question (truncated 200ch), model_id, tokens_used, tokens_input, tokens_output, iterations_run, wall_time_sec, tavily_searches, estimated_cost_usd, budget_exhausted, confidence. Cost estimation reads ~/.marchwarden/prices.toml, which is auto-created with seed values for current Anthropic + Tavily rates on first run. Operators are expected to update prices.toml manually when upstream rates change — there is no automatic fetching. Existing files are never overwritten. Unknown models log a WARN and record estimated_cost_usd: null instead of crashing. Each ledger write also emits a structured `cost_recorded` log line via the M2.5.1 logger, so cost data ships to OpenSearch alongside the ledger file with no extra plumbing. Tracking changes in agent.py: - Track tokens_input / tokens_output split (not just total) - Count tavily_searches across iterations - _synthesize now returns (result, synth_in, synth_out) so the caller can attribute synthesis tokens to the running counters - Ledger.record() called after research_completed log; failures are caught and warn-logged so a ledger write can never poison a successful research call Tests cover: price table seeding, no-overwrite of existing files, cost estimation for known/unknown models, tavily-only cost, ledger appends, question truncation, env var override. End-to-end verified with a real Anthropic+Tavily call: 9107 input + 1140 output tokens, 1 tavily search, $0.049 estimated. 104/104 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:52:25 -06:00
archeious	d25c8865ea	Merge pull request 'M2.5.1: Structured application logger' (#27 ) from feat/structured-logging into main Reviewed-on: #27 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 21:48:10 +00:00
Jeff Smith	8a62f6b014	M2.5.1: Structured application logger via structlog (#24 ) Adds an operational logging layer separate from the JSONL trace audit logs. Operational logs cover system events (startup, errors, MCP transport, research lifecycle); JSONL traces remain the researcher provenance audit trail. Backend: structlog with two renderers selectable via MARCHWARDEN_LOG_FORMAT (json\|console). Defaults to console when stderr is a TTY, json otherwise — so dev runs are human-readable and shipped runs (containers, automation) emit OpenSearch-ready JSON without configuration. Key features: - Named loggers per component: marchwarden.cli, marchwarden.mcp, marchwarden.researcher.web - MARCHWARDEN_LOG_LEVEL controls global level (default INFO) - MARCHWARDEN_LOG_FILE=1 enables a 10MB-rotating file at ~/.marchwarden/logs/marchwarden.log - structlog contextvars bind trace_id + researcher at the start of each research() call so every downstream log line carries them automatically; cleared on completion - stdlib logging is funneled through the same pipeline so noisy third-party loggers (httpx, anthropic) get the same formatting and quieted to WARN unless DEBUG is requested - Logs to stderr to keep MCP stdio stdout clean Wired into: - cli.main.cli — configures logging on startup, logs ask_started/ ask_completed/ask_failed - researchers.web.server.main — configures logging on startup, logs mcp_server_starting - researchers.web.agent.research — binds trace context, logs research_started/research_completed Tests verify JSON and console formats, contextvar propagation, level filtering, idempotency, and auto-configure-on-first-use. 94/94 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:46:51 -06:00
archeious	8293cbfb68	Merge pull request 'Propagate parent env to MCP server subprocess' (#23 ) from fix/mcp-env-propagation into main Reviewed-on: #23 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 21:32:10 +00:00
Jeff Smith	d0a732735e	Propagate parent env to MCP server subprocess (#18 ) The mcp SDK's StdioServerParameters does not pass the parent process's environment to the spawned server by default, so env vars set on the CLI process (notably MARCHWARDEN_MODEL) were silently dropped on the way to the researcher. Pass env=os.environ.copy() to StdioServerParameters so the server sees the same environment as the CLI. Also update scripts/docker-test.sh to forward MARCHWARDEN_MODEL into the container and to detect a non-TTY parent so non-interactive `ask` invocations don't fail with "the input device is not a TTY". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:31:14 -06:00
archeious	712638fe8c	Merge pull request 'Enforce token_budget before each iteration' (#22 ) from fix/budget-enforcement into main Reviewed-on: #22 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 21:30:26 +00:00
Jeff Smith	6ff1a6af3d	Enforce token_budget before each iteration (#17 ) The loop previously checked the token budget at the bottom of each iteration, after the LLM call and tool work had already happened. By the time the cap was caught the budget had been exceeded and the overshoot was unbounded by the iteration's cost. Move the check to the top of the loop so a new iteration is never started past the budget. Document the policy explicitly: token_budget is a soft cap on the tool-use loop only; the synthesis call is always allowed to complete so callers get a structured ResearchResult rather than a fallback stub. Capping synthesis is a separate, larger design question (would require splitting the budget between loop and synthesis up-front). Verified: token_budget=5000, max_iterations=10 now stops after 2 iterations with budget_exhausted=True and a complete answer with 10 citations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:29:22 -06:00
archeious	50d59abf52	Merge pull request 'Fix invalid default model id' (#21 ) from fix/model-default-id into main Reviewed-on: #21 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 21:26:05 +00:00
Jeff Smith	eb2e71835c	Fix invalid default model id (#15 ) Both the MCP server and WebResearcher defaulted to claude-sonnet-4-5-20250514, which 404s against the Anthropic API. Update both defaults to claude-sonnet-4-6, which is current as of 2026-04. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:25:19 -06:00
archeious	c19a161a62	Merge pull request 'Fix synthesis truncation and trace masking' (#20 ) from fix/synthesis-truncation into main Reviewed-on: #20 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 21:24:41 +00:00
Jeff Smith	7956bf4873	Fix synthesis truncation and trace masking (#16 , #19 ) The synthesis step was passing max_tokens=4096 to Claude, which was not enough for a full ResearchResult JSON over a real evidence set (28 sources). The model's output got cut mid-string, json.loads failed, and the agent fell back to a stub answer with zero citations. The trace logger then truncated the raw_response to 1000 chars before recording it, hiding the actual reason for the parse failure (the truncated JSON suffix) and making the bug invisible from traces. Fixes: - Bump synthesis max_tokens to 16384 - Capture and log Claude's stop_reason on synthesis_error so future truncation cases are diagnosable from the trace alone - Log the parser exception text alongside the raw_response - Stop slicing raw_response — record the full string Verified end-to-end against the Utah crops question: - Before: 0 citations, confidence 0.10, fallback stub - After: 9 citations, confidence 0.88, real synthesized answer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:23:03 -06:00
archeious	16d88e951b	Merge pull request 'chore: docker-based test environment' (#14 ) from chore/docker-test-env into main Reviewed-on: #14 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 21:08:27 +00:00
Jeff Smith	40d0725497	chore: add docker-based test environment (#13 ) Reproducible Python 3.12-slim container that installs the project editable with dev deps. Adds pytest-asyncio to dev deps so async tests run cleanly inside the container (host had it installed out-of-band). scripts/docker-test.sh provides build, test, ask, replay, and shell subcommands. The ask/replay/shell commands mount ~/secrets read-only and ~/.marchwarden read-write so end-to-end runs persist traces back to the host. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:06:12 -06:00
archeious	bca7294ec8	Merge pull request 'M2.2: marchwarden replay CLI command' (#12 ) from feat/cli-replay into main Reviewed-on: #12 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 20:59:12 +00:00
Jeff Smith	273d144381	M2.2: marchwarden replay CLI command (#9 ) Adds `marchwarden replay <trace_id>` to pretty-print a prior research run from its JSONL trace file. Resolves the trace under ~/.marchwarden/traces/ by default; --trace-dir overrides for tests and custom locations. Renders each step as a row with action, decision, extra fields, and content_hash. Friendly errors for unknown trace_id and malformed JSON lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 14:57:37 -06:00
archeious	b2b7026eb2	Merge pull request 'M2.1: marchwarden ask CLI command' (#11 ) from feat/cli-ask into main Reviewed-on: #11 Reviewed-by: archeious <archeious@unbiasedgeek.com>	2026-04-08 20:54:59 +00:00
Jeff Smith	87a34c60d1	M2.1: marchwarden ask CLI command (#8 ) Click app with `ask` subcommand that spawns the web researcher MCP server over stdio, calls the research tool, and pretty-prints the ResearchResult contract using rich (panels for answer/confidence/cost, tables for citations, gaps, discovery events, and open questions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 14:51:40 -06:00
Jeff Smith	166d86e190	chore: add CLAUDE.md for session 1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:44:16 -06:00
archeious	7088f45f06	Merge pull request 'M1.4: MCP server' (#7 ) from feat/mcp-server into main	2026-04-08 20:41:28 +00:00
Jeff Smith	5d894d9e10	M1.4: MCP server wrapping web researcher FastMCP server exposing a single 'research' tool: - Delegates to WebResearcher with keys from ~/secrets - Accepts question, context, depth, max_iterations, token_budget - Returns full ResearchResult as JSON - Configurable model via MARCHWARDEN_MODEL env var - Runnable as: python -m researchers.web 4 tests: secret reading, JSON response validation, default parameters. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:41:13 -06:00
archeious	f593dd060b	Merge pull request 'Add OpenQuestion to research contract' (#6 ) from feat/open-questions into main	2026-04-08 20:37:54 +00:00
Jeff Smith	ae9c11a79b	Add OpenQuestion to research contract New field on ResearchResult: open_questions — follow-up questions that emerged from the research itself. Distinct from gaps (backward: what failed) and discovery_events (sideways: what's lateral). Open questions look forward: 'based on what I found, this needs deeper investigation.' - OpenQuestion model: question, context, priority (high/medium/low), source_locator - Updated agent synthesis prompt to produce open_questions - Updated agent result builder to parse open_questions from JSON - 3 new tests for OpenQuestion model - Updated existing tests for new field 77 tests passing. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:37:30 -06:00
archeious	ece2455415	Merge pull request 'M1.3: Inner agent loop' (#5 ) from feat/agent-loop into main	2026-04-08 20:29:41 +00:00
Jeff Smith	7cb3fde90e	M1.3: Inner agent loop with tests WebResearcher — the core agentic research loop: - Tool-use loop: Claude decides when to search (Tavily) and fetch (httpx) - Budget enforcement: stops at max_iterations or token_budget - Synthesis step: separate LLM call produces structured ResearchResult JSON - Fallback: valid ResearchResult even when synthesis JSON is unparseable - Full trace logging at every step (start, search, fetch, synthesis, complete) - Populates all contract fields: raw_excerpt, categorized gaps, discovery_events, confidence_factors, cost_metadata with model_id 9 tests: complete research loop, budget exhaustion, synthesis failure fallback, trace file creation, fetch_url tool integration, search result formatting. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:29:27 -06:00
archeious	21c8191b81	Merge pull request 'M1.2: Trace logger' (#4 ) from feat/trace-logger into main	2026-04-08 20:25:58 +00:00
Jeff Smith	cef08c8984	M1.2: Trace logger with tests TraceLogger produces JSONL audit logs per research() call: - One file per trace_id at ~/.marchwarden/traces/{trace_id}.jsonl - Each line is a self-contained JSON object (step, action, timestamp, decision) - Supports arbitrary kwargs (url, content_hash, query, etc.) - Lazy file handle, flush after each write, context manager support - read_entries() for replay and testing 15 tests: file creation, step counting, JSONL validity, kwargs, timestamps, flush behavior, multiple independent traces. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:21:10 -06:00
archeious	851fed6a5f	Merge pull request 'M1.1: Search and fetch tools' (#3 ) from feat/search-fetch-tools into main	2026-04-08 20:19:21 +00:00
Jeff Smith	a5bc93e275	M1.1: Search and fetch tools with tests - tavily_search(): Tavily API wrapper returning SearchResult dataclasses with content hashing (raw_content preferred, falls back to summary) - fetch_url(): async URL fetch with HTML text extraction, content hashing, and graceful error handling (timeout, HTTP errors, connection errors) - _extract_text(): simple HTML → clean text (strip scripts/styles/tags, decode entities, collapse whitespace) - _sha256(): SHA-256 content hashing with 'sha256:' prefix for traces 18 tests: hashing, HTML extraction, mocked Tavily search, mocked async fetch (success, timeout, HTTP error, hash consistency). Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:17:18 -06:00
archeious	8930f4486a	Merge pull request 'M0.3: Contract v1 Pydantic models' (#2 ) from feat/contract-models into main	2026-04-08 20:14:45 +00:00
Jeff Smith	1b0f86399a	M0.3: Implement contract v1 Pydantic models with tests All Research Contract types as Pydantic models: - ResearchConstraints (input) - Citation with raw_excerpt (output) - GapCategory enum (5 categories) - Gap with structured category (output) - DiscoveryEvent (lateral findings) - ConfidenceFactors (auditable scoring inputs) - CostMetadata with model_id (resource tracking) - ResearchResult (top-level contract) 32 tests: validation, bounds checking, serialization roundtrips, JSON structure verification against contract spec. Refs: archeious/marchwarden#1 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 14:00:45 -06:00
Jeff Smith	6a8445ed13	Fix README wiki links to use absolute URLs Relative /wiki/ paths resolve against the Forgejo root, not the repo. Use full URLs so links work from the repo README page. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 13:41:03 -06:00
Jeff Smith	79becb21ec	Fix README: correct clone URL and wiki link paths - Update clone URL to archeious/marchwarden (was claude-code) - Fix wiki links to use /wiki/ routes instead of docs/wiki/ source paths - Fix issue link to correct repo Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 12:07:29 -06:00
Jeff Smith	deb124ed29	Initial project structure and scaffolding - Directory layout: researchers/web/, orchestrator/, cli/, docs/wiki/ - README with quick start and vision - CONTRIBUTING with workflow and testing guidelines - pyproject.toml with dependencies and build config - .gitignore for Python projects Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-08 11:57:15 -06:00
claude-code	f1e27e35f0	Initial commit	2026-04-08 17:56:21 +00:00

38 commits