Commit graph

52 commits

Author SHA1 Message Date
f68bbb1052 Merge pull request 'fix(observability): persist full ResearchResult and per-item trace events' (#56) from feat/trace-full-result into main 2026-04-09 01:27:47 +00:00
Jeff Smith
1203b07248 fix(observability): persist full ResearchResult and per-item trace events
Closes #54.

The JSONL trace previously stored only counts on the `complete` event
(gap_count, citation_count, discovery_count). Replay could re-render the
step log but could not recover which gaps fired or which sources were
cited, blocking M3.2/M3.3 stress-testing and calibration work.

Two complementary fixes:

1. (a) TraceLogger.write_result() dumps the pydantic ResearchResult to
   `<trace_id>.result.json` next to the JSONL trace. The agent calls it
   right before emitting the `complete` step. `cli replay` now loads the
   sibling result file when present and renders the structured tables
   under the trace step log.

2. (b) The agent emits one `gap_recorded`, `citation_recorded`, or
   `discovery_recorded` trace event per item from the final result. This
   gives the JSONL stream a queryable timeline of what was kept, with
   categories and topics in-band, without needing to load the result
   sibling.

Tests: 4 added (127 total passing). Smoke-tested live with a real ask;
both files written and replay rendering verified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:27:33 -06:00
5ea6879ba0 Merge pull request 'docs(stress-tests): archive M3.1 results' (#55) from feat/m3.1-stress-tests into main 2026-04-09 01:21:43 +00:00
Jeff Smith
a39407f03e docs(stress-tests): archive M3.1 results
Single-axis stress test results from Issue #44. 1 of 4 query targets
cleanly hit (Q3); Q1/Q2 missed because queries weren't adversarial
enough; Q4 missed due to budget cap lag bug filed as #53. Trace
observability gap blocking M3.2/M3.3 filed as #54.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:21:34 -06:00
Jeff Smith
d279c4c20e chore: update CLAUDE.md for session 2
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 17:30:59 -06:00
af7935819f Merge pull request 'Record per-step durations in trace and operational logs' (#36) from feat/step-durations into main
Reviewed-on: #36
2026-04-08 22:58:12 +00:00
Jeff Smith
ddaf7e85c3 Record per-step durations in trace and operational logs (#35)
TraceLogger now tracks monotonic start times for starter actions
(web_search, fetch_url, synthesis_start, start) and attaches a
duration_ms field to the matching completer (web_search_complete,
fetch_url_complete, synthesis_complete, synthesis_error). The
terminal 'complete' step gets total_duration_sec instead.

Pairings are tightly sequential in the agent code (each
_execute_tool call runs start→end before returning), so a simple
dict keyed by starter name suffices — no queueing needed. An
unpaired completer leaves duration unset and does not crash.

Durations flow into both the JSONL trace and the structlog
operational log, so OpenSearch queries can filter / aggregate
by step latency without cross-row joins.

Verified end-to-end on a real shallow query:
  web_search       5,233 ms
  web_search       3,006 ms
  synthesis_complete 27,658 ms
  complete         47.547 s total

Synthesis is by far the slowest step — visible at a glance
for the first time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:49:52 -06:00
226c1c8660 Merge pull request 'depth flag now drives constraint defaults' (#33) from feat/depth-presets into main
Reviewed-on: #33
2026-04-08 22:33:48 +00:00
6e7d3bc98a Merge pull request 'chore: Makefile with venv-based dev workflow' (#34) from chore/makefile into main
Reviewed-on: #34
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 22:32:28 +00:00
Jeff Smith
9ecc1db38d chore: add Makefile with venv-based dev workflow
Targets:
  make install      create .venv and pip install -e ".[dev]"
  make test         pytest inside the venv
  make test-cov     pytest with coverage
  make lint         ruff + black --check
  make ask          run a sample research call
  make costs        show the cost ledger
  make clean        remove venv and caches
  make docker-build / docker-test   parity wrappers for the docker flow

Lets contributors get from clone to running CLI in one command
without depending on docker. README points at make install as
the recommended path; manual venv steps documented as fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:31:00 -06:00
Jeff Smith
ae48acd421 depth flag now drives constraint defaults (#30)
Previously the depth parameter (shallow/balanced/deep) was passed
only as a text hint inside the agent's user message, with no
mechanical effect on iterations, token budget, or source count.
The flag was effectively cosmetic — the LLM was expected to
"interpret" it.

Add DEPTH_PRESETS table and constraints_for_depth() helper in
researchers.web.models:

  shallow:  2 iters,  5,000 tokens,  5 sources
  balanced: 5 iters, 20,000 tokens, 10 sources  (= historical defaults)
  deep:     8 iters, 60,000 tokens, 20 sources

Wired through the stack:

- WebResearcher.research(): when constraints is None, builds from
  the depth preset instead of bare ResearchConstraints()
- MCP server `research` tool: max_iterations and token_budget now
  default to None; constraints are built via constraints_for_depth
  with explicit values overriding the preset
- CLI `ask` command: --max-iterations and --budget default to None;
  the CLI only forwards them to the MCP tool when set, so unset
  flags fall through to the depth preset

balanced is unchanged from the historical defaults so existing
callers see no behavior difference. Explicit --max-iterations /
--budget always win over the preset.

Tests cover each preset's values, balanced backward-compat,
unknown depth fallback, full override, and partial override.
116/116 tests passing. Live-verified: --depth shallow on a simple
question now caps at 2 iterations and stays under budget.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:27:38 -06:00
d51f16d33e Merge pull request 'Mirror trace steps to the operational logger' (#32) from feat/per-step-logging into main
Reviewed-on: #32
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 22:23:01 +00:00
Jeff Smith
b510902af3 Mirror trace steps to operational logger
The trace JSONL captures every step of a research call (search,
fetch, iteration boundaries, synthesis), but the structured
operational log only fired at research_started / research_completed,
giving administrators no real-time visibility into agent progress.

Have TraceLogger.log_step also emit a structlog event using the
same action name, fields, and step counter. trace_id and researcher
are already bound in contextvars by WebResearcher.research, so
every line carries them automatically — no plumbing needed.

Volume control: a curated set of milestone actions logs at INFO
(start, iteration_start, synthesis_start/complete/error, budget_-
exhausted, complete). Chatty per-tool actions (web_search,
fetch_url and their *_complete pairs) log at DEBUG. Default
MARCHWARDEN_LOG_LEVEL=INFO shows ~9 lines per call;
MARCHWARDEN_LOG_LEVEL=DEBUG shows everything.

This keeps dev stderr readable while making full step visibility
one env var away — and OpenSearch can ingest at DEBUG always.

Verified end-to-end: Utah peak query at INFO produces 9 milestone
log lines, at DEBUG produces 13.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:22:13 -06:00
bbb08b7789 Merge pull request 'Display budget as spend status, not exhaustion alarm' (#31) from fix/budget-display-clarity into main
Reviewed-on: #31
2026-04-08 22:20:01 +00:00
Jeff Smith
c0d4f391b6 Display budget as spend status, not exhaustion alarm
Replace 'Budget exhausted: True/False' with 'Budget status: spent /
under cap' in the Confidence panel. The previous wording read as a
failure indicator when in practice 'exhausted' just means the agent
spent its tool-use cap before voluntarily stopping — the normal,
expected outcome on real questions with the default 20k budget.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:12:39 -06:00
4816b9386e Merge pull request 'M2.5.3: marchwarden costs CLI command' (#29) from feat/costs-command into main
Reviewed-on: #29
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 21:59:07 +00:00
Jeff Smith
6fdf0e338a M2.5.3: marchwarden costs CLI command (#26)
Adds operator-facing `marchwarden costs` subcommand that reads the
JSONL ledger from M2.5.2 and pretty-prints a rich summary:

- Cost Summary panel: total calls, total spend, total tokens (input/
  output split), Tavily search count, warning for any calls with
  unknown model prices
- Per-Day table sorted by date
- Per-Model table sorted by model id
- Highest-Cost Call panel with trace_id and question

Flags:
  --since   ISO date or relative shorthand (7d, 24h, 2w, 1m)
  --until   same
  --model   filter to a specific model_id
  --json    emit raw filtered ledger entries instead of the table
  --ledger  override default path (mostly for tests)

Also fixes a Dockerfile gap: the obs/ package added in M2.5.1 was
not being COPYed into the image, so the installed `marchwarden`
entry point couldn't import it. Tests had been passing because
they mounted /app over the install. Adding `COPY obs ./obs`
restores parity.

Tests cover summary rendering, model filter, since-date filter,
JSON output, and the empty-ledger friendly path. 110/110 passing.
End-to-end verified against the real cost ledger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:57:39 -06:00
5a0ca73e2a Merge pull request 'M2.5.2: Cost ledger with price table' (#28) from feat/cost-ledger into main
Reviewed-on: #28
2026-04-08 21:54:23 +00:00
Jeff Smith
0d957336f5 M2.5.2: Cost ledger with price table (#25)
Adds an append-only JSONL ledger of every research() call at
~/.marchwarden/costs.jsonl, supplementing (not replacing) the
per-call cost_metadata field returned to callers. The ledger is
the operator-facing source of truth for spend tracking, queryable
via the upcoming `marchwarden costs` command (M2.5.3).

Fields per entry: timestamp, trace_id, question (truncated 200ch),
model_id, tokens_used, tokens_input, tokens_output, iterations_run,
wall_time_sec, tavily_searches, estimated_cost_usd, budget_exhausted,
confidence.

Cost estimation reads ~/.marchwarden/prices.toml, which is
auto-created with seed values for current Anthropic + Tavily rates
on first run. Operators are expected to update prices.toml
manually when upstream rates change — there is no automatic
fetching. Existing files are never overwritten. Unknown models
log a WARN and record estimated_cost_usd: null instead of
crashing.

Each ledger write also emits a structured `cost_recorded` log line
via the M2.5.1 logger, so cost data ships to OpenSearch alongside
the ledger file with no extra plumbing.

Tracking changes in agent.py:
- Track tokens_input / tokens_output split (not just total)
- Count tavily_searches across iterations
- _synthesize now returns (result, synth_in, synth_out) so the
  caller can attribute synthesis tokens to the running counters
- Ledger.record() called after research_completed log; failures
  are caught and warn-logged so a ledger write can never poison
  a successful research call

Tests cover: price table seeding, no-overwrite of existing files,
cost estimation for known/unknown models, tavily-only cost,
ledger appends, question truncation, env var override.
End-to-end verified with a real Anthropic+Tavily call:
9107 input + 1140 output tokens, 1 tavily search, $0.049 estimated.

104/104 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:52:25 -06:00
d25c8865ea Merge pull request 'M2.5.1: Structured application logger' (#27) from feat/structured-logging into main
Reviewed-on: #27
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 21:48:10 +00:00
Jeff Smith
8a62f6b014 M2.5.1: Structured application logger via structlog (#24)
Adds an operational logging layer separate from the JSONL trace
audit logs. Operational logs cover system events (startup, errors,
MCP transport, research lifecycle); JSONL traces remain the
researcher provenance audit trail.

Backend: structlog with two renderers selectable via
MARCHWARDEN_LOG_FORMAT (json|console). Defaults to console when
stderr is a TTY, json otherwise — so dev runs are human-readable
and shipped runs (containers, automation) emit OpenSearch-ready
JSON without configuration.

Key features:
- Named loggers per component: marchwarden.cli,
  marchwarden.mcp, marchwarden.researcher.web
- MARCHWARDEN_LOG_LEVEL controls global level (default INFO)
- MARCHWARDEN_LOG_FILE=1 enables a 10MB-rotating file at
  ~/.marchwarden/logs/marchwarden.log
- structlog contextvars bind trace_id + researcher at the start
  of each research() call so every downstream log line carries
  them automatically; cleared on completion
- stdlib logging is funneled through the same pipeline so noisy
  third-party loggers (httpx, anthropic) get the same formatting
  and quieted to WARN unless DEBUG is requested
- Logs to stderr to keep MCP stdio stdout clean

Wired into:
- cli.main.cli — configures logging on startup, logs ask_started/
  ask_completed/ask_failed
- researchers.web.server.main — configures logging on startup,
  logs mcp_server_starting
- researchers.web.agent.research — binds trace context, logs
  research_started/research_completed

Tests verify JSON and console formats, contextvar propagation,
level filtering, idempotency, and auto-configure-on-first-use.
94/94 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:46:51 -06:00
8293cbfb68 Merge pull request 'Propagate parent env to MCP server subprocess' (#23) from fix/mcp-env-propagation into main
Reviewed-on: #23
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 21:32:10 +00:00
Jeff Smith
d0a732735e Propagate parent env to MCP server subprocess (#18)
The mcp SDK's StdioServerParameters does not pass the parent
process's environment to the spawned server by default, so env
vars set on the CLI process (notably MARCHWARDEN_MODEL) were
silently dropped on the way to the researcher.

Pass env=os.environ.copy() to StdioServerParameters so the server
sees the same environment as the CLI. Also update scripts/docker-test.sh
to forward MARCHWARDEN_MODEL into the container and to detect a
non-TTY parent so non-interactive `ask` invocations don't fail with
"the input device is not a TTY".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:31:14 -06:00
712638fe8c Merge pull request 'Enforce token_budget before each iteration' (#22) from fix/budget-enforcement into main
Reviewed-on: #22
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 21:30:26 +00:00
Jeff Smith
6ff1a6af3d Enforce token_budget before each iteration (#17)
The loop previously checked the token budget at the *bottom* of each
iteration, after the LLM call and tool work had already happened. By
the time the cap was caught the budget had been exceeded and the
overshoot was unbounded by the iteration's cost.

Move the check to the *top* of the loop so a new iteration is never
started past the budget. Document the policy explicitly: token_budget
is a soft cap on the tool-use loop only; the synthesis call is always
allowed to complete so callers get a structured ResearchResult rather
than a fallback stub. Capping synthesis is a separate, larger design
question (would require splitting the budget between loop and
synthesis up-front).

Verified: token_budget=5000, max_iterations=10 now stops after 2
iterations with budget_exhausted=True and a complete answer with
10 citations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:29:22 -06:00
50d59abf52 Merge pull request 'Fix invalid default model id' (#21) from fix/model-default-id into main
Reviewed-on: #21
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 21:26:05 +00:00
Jeff Smith
eb2e71835c Fix invalid default model id (#15)
Both the MCP server and WebResearcher defaulted to
claude-sonnet-4-5-20250514, which 404s against the Anthropic API.
Update both defaults to claude-sonnet-4-6, which is current as of
2026-04.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:25:19 -06:00
c19a161a62 Merge pull request 'Fix synthesis truncation and trace masking' (#20) from fix/synthesis-truncation into main
Reviewed-on: #20
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 21:24:41 +00:00
Jeff Smith
7956bf4873 Fix synthesis truncation and trace masking (#16, #19)
The synthesis step was passing max_tokens=4096 to Claude, which was
not enough for a full ResearchResult JSON over a real evidence set
(28 sources). The model's output got cut mid-string, json.loads
failed, and the agent fell back to a stub answer with zero citations.

The trace logger then truncated the raw_response to 1000 chars before
recording it, hiding the actual reason for the parse failure (the
truncated JSON suffix) and making the bug invisible from traces.

Fixes:
- Bump synthesis max_tokens to 16384
- Capture and log Claude's stop_reason on synthesis_error so future
  truncation cases are diagnosable from the trace alone
- Log the parser exception text alongside the raw_response
- Stop slicing raw_response — record the full string

Verified end-to-end against the Utah crops question:
- Before: 0 citations, confidence 0.10, fallback stub
- After:  9 citations, confidence 0.88, real synthesized answer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:23:03 -06:00
16d88e951b Merge pull request 'chore: docker-based test environment' (#14) from chore/docker-test-env into main
Reviewed-on: #14
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 21:08:27 +00:00
Jeff Smith
40d0725497 chore: add docker-based test environment (#13)
Reproducible Python 3.12-slim container that installs the project
editable with dev deps. Adds pytest-asyncio to dev deps so async tests
run cleanly inside the container (host had it installed out-of-band).

scripts/docker-test.sh provides build, test, ask, replay, and shell
subcommands. The ask/replay/shell commands mount ~/secrets read-only
and ~/.marchwarden read-write so end-to-end runs persist traces back
to the host.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:06:12 -06:00
bca7294ec8 Merge pull request 'M2.2: marchwarden replay CLI command' (#12) from feat/cli-replay into main
Reviewed-on: #12
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 20:59:12 +00:00
Jeff Smith
273d144381 M2.2: marchwarden replay CLI command (#9)
Adds `marchwarden replay <trace_id>` to pretty-print a prior research
run from its JSONL trace file. Resolves the trace under
~/.marchwarden/traces/ by default; --trace-dir overrides for tests and
custom locations. Renders each step as a row with action, decision,
extra fields, and content_hash. Friendly errors for unknown trace_id
and malformed JSON lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:57:37 -06:00
b2b7026eb2 Merge pull request 'M2.1: marchwarden ask CLI command' (#11) from feat/cli-ask into main
Reviewed-on: #11
Reviewed-by: archeious <archeious@unbiasedgeek.com>
2026-04-08 20:54:59 +00:00
Jeff Smith
87a34c60d1 M2.1: marchwarden ask CLI command (#8)
Click app with `ask` subcommand that spawns the web researcher MCP
server over stdio, calls the research tool, and pretty-prints the
ResearchResult contract using rich (panels for answer/confidence/cost,
tables for citations, gaps, discovery events, and open questions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:51:40 -06:00
Jeff Smith
166d86e190 chore: add CLAUDE.md for session 1
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:44:16 -06:00
7088f45f06 Merge pull request 'M1.4: MCP server' (#7) from feat/mcp-server into main 2026-04-08 20:41:28 +00:00
Jeff Smith
5d894d9e10 M1.4: MCP server wrapping web researcher
FastMCP server exposing a single 'research' tool:
- Delegates to WebResearcher with keys from ~/secrets
- Accepts question, context, depth, max_iterations, token_budget
- Returns full ResearchResult as JSON
- Configurable model via MARCHWARDEN_MODEL env var
- Runnable as: python -m researchers.web

4 tests: secret reading, JSON response validation, default parameters.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:41:13 -06:00
f593dd060b Merge pull request 'Add OpenQuestion to research contract' (#6) from feat/open-questions into main 2026-04-08 20:37:54 +00:00
Jeff Smith
ae9c11a79b Add OpenQuestion to research contract
New field on ResearchResult: open_questions — follow-up questions that
emerged from the research itself. Distinct from gaps (backward: what
failed) and discovery_events (sideways: what's lateral). Open questions
look forward: 'based on what I found, this needs deeper investigation.'

- OpenQuestion model: question, context, priority (high/medium/low),
  source_locator
- Updated agent synthesis prompt to produce open_questions
- Updated agent result builder to parse open_questions from JSON
- 3 new tests for OpenQuestion model
- Updated existing tests for new field

77 tests passing.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:37:30 -06:00
ece2455415 Merge pull request 'M1.3: Inner agent loop' (#5) from feat/agent-loop into main 2026-04-08 20:29:41 +00:00
Jeff Smith
7cb3fde90e M1.3: Inner agent loop with tests
WebResearcher — the core agentic research loop:
- Tool-use loop: Claude decides when to search (Tavily) and fetch (httpx)
- Budget enforcement: stops at max_iterations or token_budget
- Synthesis step: separate LLM call produces structured ResearchResult JSON
- Fallback: valid ResearchResult even when synthesis JSON is unparseable
- Full trace logging at every step (start, search, fetch, synthesis, complete)
- Populates all contract fields: raw_excerpt, categorized gaps,
  discovery_events, confidence_factors, cost_metadata with model_id

9 tests: complete research loop, budget exhaustion, synthesis failure
fallback, trace file creation, fetch_url tool integration, search
result formatting.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:29:27 -06:00
21c8191b81 Merge pull request 'M1.2: Trace logger' (#4) from feat/trace-logger into main 2026-04-08 20:25:58 +00:00
Jeff Smith
cef08c8984 M1.2: Trace logger with tests
TraceLogger produces JSONL audit logs per research() call:
- One file per trace_id at ~/.marchwarden/traces/{trace_id}.jsonl
- Each line is a self-contained JSON object (step, action, timestamp, decision)
- Supports arbitrary kwargs (url, content_hash, query, etc.)
- Lazy file handle, flush after each write, context manager support
- read_entries() for replay and testing

15 tests: file creation, step counting, JSONL validity, kwargs,
timestamps, flush behavior, multiple independent traces.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:21:10 -06:00
851fed6a5f Merge pull request 'M1.1: Search and fetch tools' (#3) from feat/search-fetch-tools into main 2026-04-08 20:19:21 +00:00
Jeff Smith
a5bc93e275 M1.1: Search and fetch tools with tests
- tavily_search(): Tavily API wrapper returning SearchResult dataclasses
  with content hashing (raw_content preferred, falls back to summary)
- fetch_url(): async URL fetch with HTML text extraction, content hashing,
  and graceful error handling (timeout, HTTP errors, connection errors)
- _extract_text(): simple HTML → clean text (strip scripts/styles/tags,
  decode entities, collapse whitespace)
- _sha256(): SHA-256 content hashing with 'sha256:' prefix for traces

18 tests: hashing, HTML extraction, mocked Tavily search, mocked async
fetch (success, timeout, HTTP error, hash consistency).

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:17:18 -06:00
8930f4486a Merge pull request 'M0.3: Contract v1 Pydantic models' (#2) from feat/contract-models into main 2026-04-08 20:14:45 +00:00
Jeff Smith
1b0f86399a M0.3: Implement contract v1 Pydantic models with tests
All Research Contract types as Pydantic models:
- ResearchConstraints (input)
- Citation with raw_excerpt (output)
- GapCategory enum (5 categories)
- Gap with structured category (output)
- DiscoveryEvent (lateral findings)
- ConfidenceFactors (auditable scoring inputs)
- CostMetadata with model_id (resource tracking)
- ResearchResult (top-level contract)

32 tests: validation, bounds checking, serialization roundtrips,
JSON structure verification against contract spec.

Refs: archeious/marchwarden#1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 14:00:45 -06:00
Jeff Smith
6a8445ed13 Fix README wiki links to use absolute URLs
Relative /wiki/ paths resolve against the Forgejo root, not the repo.
Use full URLs so links work from the repo README page.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 13:41:03 -06:00
Jeff Smith
79becb21ec Fix README: correct clone URL and wiki link paths
- Update clone URL to archeious/marchwarden (was claude-code)
- Fix wiki links to use /wiki/ routes instead of docs/wiki/ source paths
- Fix issue link to correct repo

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 12:07:29 -06:00