Commit graph

28 commits

Author SHA1 Message Date
Jeff Smith
6ff1a6af3d Enforce token_budget before each iteration (#17)
The loop previously checked the token budget at the *bottom* of each
iteration, after the LLM call and tool work had already happened. By
the time the cap was caught the budget had been exceeded and the
overshoot was unbounded by the iteration's cost.

Move the check to the *top* of the loop so a new iteration is never
started past the budget. Document the policy explicitly: token_budget
is a soft cap on the tool-use loop only; the synthesis call is always
allowed to complete so callers get a structured ResearchResult rather
than a fallback stub. Capping synthesis is a separate, larger design
question (would require splitting the budget between loop and
synthesis up-front).

Verified: token_budget=5000, max_iterations=10 now stops after 2
iterations with budget_exhausted=True and a complete answer with
10 citations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:29:22 -06:00
50d59abf52 Merge pull request 'Fix invalid default model id' (#21) from fix/model-default-id into main
Reviewed-on: #21
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 21:26:05 +00:00
Jeff Smith
eb2e71835c Fix invalid default model id (#15)
Both the MCP server and WebResearcher defaulted to
claude-sonnet-4-5-20250514, which 404s against the Anthropic API.
Update both defaults to claude-sonnet-4-6, which is current as of
2026-04.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:25:19 -06:00
c19a161a62 Merge pull request 'Fix synthesis truncation and trace masking' (#20) from fix/synthesis-truncation into main
Reviewed-on: #20
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 21:24:41 +00:00
Jeff Smith
7956bf4873 Fix synthesis truncation and trace masking (#16, #19)
The synthesis step was passing max_tokens=4096 to Claude, which was
not enough for a full ResearchResult JSON over a real evidence set
(28 sources). The model's output got cut mid-string, json.loads
failed, and the agent fell back to a stub answer with zero citations.

The trace logger then truncated the raw_response to 1000 chars before
recording it, hiding the actual reason for the parse failure (the
truncated JSON suffix) and making the bug invisible from traces.

Fixes:
- Bump synthesis max_tokens to 16384
- Capture and log Claude's stop_reason on synthesis_error so future
  truncation cases are diagnosable from the trace alone
- Log the parser exception text alongside the raw_response
- Stop slicing raw_response — record the full string

Verified end-to-end against the Utah crops question:
- Before: 0 citations, confidence 0.10, fallback stub
- After:  9 citations, confidence 0.88, real synthesized answer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:23:03 -06:00
16d88e951b Merge pull request 'chore: docker-based test environment' (#14) from chore/docker-test-env into main
Reviewed-on: #14
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 21:08:27 +00:00
Jeff Smith
40d0725497 chore: add docker-based test environment (#13)
Reproducible Python 3.12-slim container that installs the project
editable with dev deps. Adds pytest-asyncio to dev deps so async tests
run cleanly inside the container (host had it installed out-of-band).

scripts/docker-test.sh provides build, test, ask, replay, and shell
subcommands. The ask/replay/shell commands mount ~/secrets read-only
and ~/.marchwarden read-write so end-to-end runs persist traces back
to the host.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:06:12 -06:00
bca7294ec8 Merge pull request 'M2.2: marchwarden replay CLI command' (#12) from feat/cli-replay into main
Reviewed-on: #12
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 20:59:12 +00:00
Jeff Smith
273d144381 M2.2: marchwarden replay CLI command (#9)
Adds `marchwarden replay <trace_id>` to pretty-print a prior research
run from its JSONL trace file. Resolves the trace under
~/.marchwarden/traces/ by default; --trace-dir overrides for tests and
custom locations. Renders each step as a row with action, decision,
extra fields, and content_hash. Friendly errors for unknown trace_id
and malformed JSON lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:57:37 -06:00
b2b7026eb2 Merge pull request 'M2.1: marchwarden ask CLI command' (#11) from feat/cli-ask into main
Reviewed-on: #11
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 20:54:59 +00:00
Jeff Smith
87a34c60d1 M2.1: marchwarden ask CLI command (#8)
Click app with `ask` subcommand that spawns the web researcher MCP
server over stdio, calls the research tool, and pretty-prints the
ResearchResult contract using rich (panels for answer/confidence/cost,
tables for citations, gaps, discovery events, and open questions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:51:40 -06:00
Jeff Smith
166d86e190 chore: add CLAUDE.md for session 1
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:44:16 -06:00
7088f45f06 Merge pull request 'M1.4: MCP server' (#7) from feat/mcp-server into main 2026-04-08 20:41:28 +00:00
Jeff Smith
5d894d9e10 M1.4: MCP server wrapping web researcher
FastMCP server exposing a single 'research' tool:
- Delegates to WebResearcher with keys from ~/secrets
- Accepts question, context, depth, max_iterations, token_budget
- Returns full ResearchResult as JSON
- Configurable model via MARCHWARDEN_MODEL env var
- Runnable as: python -m researchers.web

4 tests: secret reading, JSON response validation, default parameters.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:41:13 -06:00
f593dd060b Merge pull request 'Add OpenQuestion to research contract' (#6) from feat/open-questions into main 2026-04-08 20:37:54 +00:00
Jeff Smith
ae9c11a79b Add OpenQuestion to research contract
New field on ResearchResult: open_questions — follow-up questions that
emerged from the research itself. Distinct from gaps (backward: what
failed) and discovery_events (sideways: what's lateral). Open questions
look forward: 'based on what I found, this needs deeper investigation.'

- OpenQuestion model: question, context, priority (high/medium/low),
  source_locator
- Updated agent synthesis prompt to produce open_questions
- Updated agent result builder to parse open_questions from JSON
- 3 new tests for OpenQuestion model
- Updated existing tests for new field

77 tests passing.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:37:30 -06:00
ece2455415 Merge pull request 'M1.3: Inner agent loop' (#5) from feat/agent-loop into main 2026-04-08 20:29:41 +00:00
Jeff Smith
7cb3fde90e M1.3: Inner agent loop with tests
WebResearcher — the core agentic research loop:
- Tool-use loop: Claude decides when to search (Tavily) and fetch (httpx)
- Budget enforcement: stops at max_iterations or token_budget
- Synthesis step: separate LLM call produces structured ResearchResult JSON
- Fallback: valid ResearchResult even when synthesis JSON is unparseable
- Full trace logging at every step (start, search, fetch, synthesis, complete)
- Populates all contract fields: raw_excerpt, categorized gaps,
  discovery_events, confidence_factors, cost_metadata with model_id

9 tests: complete research loop, budget exhaustion, synthesis failure
fallback, trace file creation, fetch_url tool integration, search
result formatting.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:29:27 -06:00
21c8191b81 Merge pull request 'M1.2: Trace logger' (#4) from feat/trace-logger into main 2026-04-08 20:25:58 +00:00
Jeff Smith
cef08c8984 M1.2: Trace logger with tests
TraceLogger produces JSONL audit logs per research() call:
- One file per trace_id at ~/.marchwarden/traces/{trace_id}.jsonl
- Each line is a self-contained JSON object (step, action, timestamp, decision)
- Supports arbitrary kwargs (url, content_hash, query, etc.)
- Lazy file handle, flush after each write, context manager support
- read_entries() for replay and testing

15 tests: file creation, step counting, JSONL validity, kwargs,
timestamps, flush behavior, multiple independent traces.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:21:10 -06:00
851fed6a5f Merge pull request 'M1.1: Search and fetch tools' (#3) from feat/search-fetch-tools into main 2026-04-08 20:19:21 +00:00
Jeff Smith
a5bc93e275 M1.1: Search and fetch tools with tests
- tavily_search(): Tavily API wrapper returning SearchResult dataclasses
  with content hashing (raw_content preferred, falls back to summary)
- fetch_url(): async URL fetch with HTML text extraction, content hashing,
  and graceful error handling (timeout, HTTP errors, connection errors)
- _extract_text(): simple HTML → clean text (strip scripts/styles/tags,
  decode entities, collapse whitespace)
- _sha256(): SHA-256 content hashing with 'sha256:' prefix for traces

18 tests: hashing, HTML extraction, mocked Tavily search, mocked async
fetch (success, timeout, HTTP error, hash consistency).

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:17:18 -06:00
8930f4486a Merge pull request 'M0.3: Contract v1 Pydantic models' (#2) from feat/contract-models into main 2026-04-08 20:14:45 +00:00
Jeff Smith
1b0f86399a M0.3: Implement contract v1 Pydantic models with tests
All Research Contract types as Pydantic models:
- ResearchConstraints (input)
- Citation with raw_excerpt (output)
- GapCategory enum (5 categories)
- Gap with structured category (output)
- DiscoveryEvent (lateral findings)
- ConfidenceFactors (auditable scoring inputs)
- CostMetadata with model_id (resource tracking)
- ResearchResult (top-level contract)

32 tests: validation, bounds checking, serialization roundtrips,
JSON structure verification against contract spec.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:00:45 -06:00
Jeff Smith
6a8445ed13 Fix README wiki links to use absolute URLs
Relative /wiki/ paths resolve against the Forgejo root, not the repo.
Use full URLs so links work from the repo README page.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 13:41:03 -06:00
Jeff Smith
79becb21ec Fix README: correct clone URL and wiki link paths
- Update clone URL to archeious/marchwarden (was claude-code)
- Fix wiki links to use /wiki/ routes instead of docs/wiki/ source paths
- Fix issue link to correct repo

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 12:07:29 -06:00
Jeff Smith
deb124ed29 Initial project structure and scaffolding
- Directory layout: researchers/web/, orchestrator/, cli/, docs/wiki/
- README with quick start and vision
- CONTRIBUTING with workflow and testing guidelines
- pyproject.toml with dependencies and build config
- .gitignore for Python projects

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 11:57:15 -06:00
f1e27e35f0 Initial commit 2026-04-08 17:56:21 +00:00