Initial wiki: Architecture, ResearchContract, DevelopmentGuide
- Architecture: system overview, component design, data flow - ResearchContract: complete tool specification with examples - DevelopmentGuide: setup, testing, workflow, debugging Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
commit
a349d6f970
3 changed files with 786 additions and 0 deletions
175
Architecture.md
Normal file
175
Architecture.md
Normal file
|
|
@ -0,0 +1,175 @@
|
||||||
|
# Architecture
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Marchwarden is a network of agentic researchers coordinated by a principal investigator (PI). Each researcher is specialized, autonomous, and fault-tolerant. The PI dispatches researchers to answer questions, waits for results, and synthesizes across responses.
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────┐
|
||||||
|
│ PI Agent │ Orchestrates, synthesizes, decides what to research
|
||||||
|
└──────┬──────┘
|
||||||
|
│ dispatch research(question)
|
||||||
|
│
|
||||||
|
┌────┴──────────────────────────┐
|
||||||
|
│ │
|
||||||
|
┌─┴────────────────────┐ ┌───────┴─────────────────┐
|
||||||
|
│ Web Researcher (MCP) │ │ Future: DB, Arxiv, etc. │
|
||||||
|
│ - Search (Tavily) │ │ (V2+) │
|
||||||
|
│ - Fetch URLs │ │ │
|
||||||
|
│ - Internal loop │ │ │
|
||||||
|
│ - Return citations │ │ │
|
||||||
|
└──────────────────────┘ └─────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
### Researchers (MCP servers)
|
||||||
|
|
||||||
|
Each researcher is a **standalone MCP server** that:
|
||||||
|
- Exposes a single tool: `research(question, context, depth, constraints)`
|
||||||
|
- Runs an internal agentic loop (plan → search → fetch → iterate → synthesize)
|
||||||
|
- Returns structured data: `answer`, `citations`, `gaps`, `cost_metadata`, `trace_id`
|
||||||
|
- Enforces budgets: iteration cap and token limit
|
||||||
|
- Logs all internal steps to JSONL trace files
|
||||||
|
|
||||||
|
**V1 researcher**: Web search + fetch
|
||||||
|
- Uses Tavily for searching
|
||||||
|
- Fetches full text from URLs
|
||||||
|
- Iterates up to 5 times or until budget exhausted
|
||||||
|
|
||||||
|
**Future researchers** (V2+): Database, Arxiv, internal documents, etc.
|
||||||
|
|
||||||
|
### MCP Protocol
|
||||||
|
|
||||||
|
Marchwarden uses the **Model Context Protocol (MCP)** as the boundary between researchers and their callers. This gives us:
|
||||||
|
|
||||||
|
- **Language agnostic** — researchers can be Python, Node, Go, etc.
|
||||||
|
- **Process isolation** — researcher crash doesn't crash the PI
|
||||||
|
- **Clean contract** — one tool signature, versioned independently
|
||||||
|
- **Parallel dispatch** — PI can await multiple researchers simultaneously
|
||||||
|
|
||||||
|
### CLI Shim
|
||||||
|
|
||||||
|
For V1, the CLI is the test harness that stands in for the PI:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
marchwarden ask "what are ideal crops for Utah?"
|
||||||
|
marchwarden replay <trace_id>
|
||||||
|
```
|
||||||
|
|
||||||
|
In V2, the CLI is replaced by a full PI orchestrator agent.
|
||||||
|
|
||||||
|
### Trace Logging
|
||||||
|
|
||||||
|
Every research call produces a **JSONL trace log**:
|
||||||
|
|
||||||
|
```
|
||||||
|
~/.marchwarden/traces/{trace_id}.jsonl
|
||||||
|
```
|
||||||
|
|
||||||
|
Each line is a JSON object:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"step": 1,
|
||||||
|
"action": "search",
|
||||||
|
"query": "Utah climate gardening",
|
||||||
|
"result": {...},
|
||||||
|
"timestamp": "2026-04-08T12:00:00Z",
|
||||||
|
"decision": "query was relevant, fetching top 3 URLs"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Traces support:
|
||||||
|
- **Debugging** — see exactly what the researcher did
|
||||||
|
- **Replay** — re-run a past session, same results
|
||||||
|
- **Eval** — audit decision-making
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
### One research call (simplified)
|
||||||
|
|
||||||
|
```
|
||||||
|
CLI: ask "What are ideal crops for Utah?"
|
||||||
|
↓
|
||||||
|
MCP: research(question="What are ideal crops for Utah?", ...)
|
||||||
|
↓
|
||||||
|
Researcher agent loop:
|
||||||
|
1. Plan: "I need climate data for Utah + crop requirements"
|
||||||
|
2. Search: Tavily query for "Utah climate zones crops"
|
||||||
|
3. Fetch: Read top 3 URLs
|
||||||
|
4. Parse: Extract relevant info
|
||||||
|
5. Synthesize: "Based on X sources, ideal crops are Y"
|
||||||
|
6. Check gaps: "Couldn't find pest info"
|
||||||
|
7. Return if confident, else iterate
|
||||||
|
↓
|
||||||
|
Response:
|
||||||
|
{
|
||||||
|
"answer": "...",
|
||||||
|
"citations": [
|
||||||
|
{"source": "web", "locator": "https://...", "snippet": "...", "confidence": 0.95},
|
||||||
|
...
|
||||||
|
],
|
||||||
|
"gaps": [
|
||||||
|
{"topic": "pest resistance", "reason": "no sources found"},
|
||||||
|
],
|
||||||
|
"cost_metadata": {
|
||||||
|
"tokens_used": 8452,
|
||||||
|
"iterations_run": 3,
|
||||||
|
"wall_time_sec": 42.5
|
||||||
|
},
|
||||||
|
"trace_id": "uuid-1234"
|
||||||
|
}
|
||||||
|
↓
|
||||||
|
CLI: Print answer + citations, save trace
|
||||||
|
```
|
||||||
|
|
||||||
|
## Contract Versioning
|
||||||
|
|
||||||
|
The `research()` tool signature is the stable contract. Changes to the contract require explicit versioning so that:
|
||||||
|
- Multiple researchers with different versions can coexist
|
||||||
|
- The PI knows what version it's calling
|
||||||
|
- Backwards compatibility (or breaking changes) is explicit
|
||||||
|
|
||||||
|
See [ResearchContract.md](ResearchContract.md) for the full spec.
|
||||||
|
|
||||||
|
## Future: The PI Agent
|
||||||
|
|
||||||
|
V2 will introduce the orchestrator:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class PIAgent:
|
||||||
|
def research_topic(self, question: str) -> Answer:
|
||||||
|
# Dispatch multiple researchers in parallel
|
||||||
|
web_results = await self.web_researcher.research(question)
|
||||||
|
arxiv_results = await self.arxiv_researcher.research(question)
|
||||||
|
|
||||||
|
# Synthesize
|
||||||
|
return self.synthesize([web_results, arxiv_results])
|
||||||
|
```
|
||||||
|
|
||||||
|
The PI:
|
||||||
|
- Decides which researchers to dispatch
|
||||||
|
- Waits for all responses
|
||||||
|
- Checks for conflicts, gaps, consensus
|
||||||
|
- Synthesizes into a final answer
|
||||||
|
- Can re-dispatch if gaps are critical
|
||||||
|
|
||||||
|
## Assumptions & Constraints
|
||||||
|
|
||||||
|
- **Researchers are honest** — they don't hallucinate citations. If they cite something, it exists in the source.
|
||||||
|
- **Tavily API is available** — for V1 web search. Degradation strategy TBD.
|
||||||
|
- **Token budgets are enforced** — the researcher respects its budget; the MCP server enforces it at the process level.
|
||||||
|
- **Traces are ephemeral** — stored locally for debugging, not synced to a database yet.
|
||||||
|
- **No multi-user** — single-user CLI for V1.
|
||||||
|
|
||||||
|
## Terminology
|
||||||
|
|
||||||
|
- **Researcher**: An agentic system specialized in a domain or source type
|
||||||
|
- **Marchwarden**: The researcher metaphor — stationed at the frontier, reporting back
|
||||||
|
- **Rihla**: (V2+) A unit of research work dispatched by the PI; one researcher's journey to answer a question
|
||||||
|
- **Trace**: A JSONL log of all decisions made during one research call
|
||||||
|
- **Gap**: An unresolved aspect of the question; the researcher couldn't find an answer
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
See also: [ResearchContract.md](ResearchContract.md), [DevelopmentGuide.md](DevelopmentGuide.md)
|
||||||
259
DevelopmentGuide.md
Normal file
259
DevelopmentGuide.md
Normal file
|
|
@ -0,0 +1,259 @@
|
||||||
|
# Development Guide
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Python 3.10+
|
||||||
|
- pip (with venv)
|
||||||
|
- Tavily API key (free tier available at https://tavily.com)
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden.git
|
||||||
|
cd marchwarden
|
||||||
|
|
||||||
|
# Create virtual environment
|
||||||
|
python3 -m venv venv
|
||||||
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||||||
|
|
||||||
|
# Install in dev mode
|
||||||
|
pip install -e ".[dev]"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Environment Setup
|
||||||
|
|
||||||
|
Create a `.env` file in the project root:
|
||||||
|
|
||||||
|
```env
|
||||||
|
TAVILY_API_KEY=<your-tavily-api-key>
|
||||||
|
ANTHROPIC_API_KEY=<your-claude-api-key>
|
||||||
|
MARCHWARDEN_TRACE_DIR=~/.marchwarden/traces
|
||||||
|
```
|
||||||
|
|
||||||
|
Test that everything works:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -c "from anthropic import Anthropic; print('OK')"
|
||||||
|
python -c "from tavily import TavilyClient; print('OK')"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
marchwarden/
|
||||||
|
├── researchers/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ └── web/ # V1: Web search researcher
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── server.py # MCP server entry point
|
||||||
|
│ ├── agent.py # Inner research agent
|
||||||
|
│ ├── models.py # Pydantic models (ResearchResult, Citation, etc)
|
||||||
|
│ └── tools.py # Tavily integration, URL fetch
|
||||||
|
├── orchestrator/ # (V2+) PI agent
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ └── pi.py
|
||||||
|
├── cli/ # CLI shim (ask, replay)
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── main.py # Entry point (@click decorators)
|
||||||
|
│ └── formatter.py # Pretty-print results
|
||||||
|
├── tests/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── test_web_researcher.py
|
||||||
|
│ └── fixtures/
|
||||||
|
├── docs/
|
||||||
|
│ └── wiki/ # You are here
|
||||||
|
├── README.md
|
||||||
|
├── CONTRIBUTING.md
|
||||||
|
├── pyproject.toml
|
||||||
|
└── .gitignore
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run all tests
|
||||||
|
pytest tests/
|
||||||
|
|
||||||
|
# Run with verbose output
|
||||||
|
pytest tests/ -v
|
||||||
|
|
||||||
|
# Run a specific test file
|
||||||
|
pytest tests/test_web_researcher.py
|
||||||
|
|
||||||
|
# Run with coverage
|
||||||
|
pytest --cov=. tests/
|
||||||
|
```
|
||||||
|
|
||||||
|
All tests are unit + integration. We do **not** mock the database or major external services (only Tavily if needed to avoid API costs).
|
||||||
|
|
||||||
|
## Running the CLI
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Ask a question
|
||||||
|
marchwarden ask "What are ideal crops for a garden in Utah?"
|
||||||
|
|
||||||
|
# With options
|
||||||
|
marchwarden ask "What is X?" --depth deep --budget 25000
|
||||||
|
|
||||||
|
# Replay a trace
|
||||||
|
marchwarden replay <trace_id>
|
||||||
|
|
||||||
|
# Show help
|
||||||
|
marchwarden --help
|
||||||
|
```
|
||||||
|
|
||||||
|
The first run will take a few seconds (agent planning + searches + fetches).
|
||||||
|
|
||||||
|
## Development Workflow
|
||||||
|
|
||||||
|
### 1. Create a branch
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git checkout -b feat/your-feature-name
|
||||||
|
```
|
||||||
|
|
||||||
|
Branch naming: `feat/`, `fix/`, `refactor/`, `chore/` + short description.
|
||||||
|
|
||||||
|
### 2. Make changes
|
||||||
|
|
||||||
|
Edit code, add tests:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run tests as you go
|
||||||
|
pytest tests/test_your_feature.py -v
|
||||||
|
|
||||||
|
# Check formatting
|
||||||
|
black --check .
|
||||||
|
ruff check .
|
||||||
|
|
||||||
|
# Type checking (optional, informational)
|
||||||
|
mypy . --ignore-missing-imports
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Commit
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add <files>
|
||||||
|
git commit -m "Brief imperative description
|
||||||
|
|
||||||
|
- What changed
|
||||||
|
- Why it changed
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
Commits should be atomic (one logical change per commit).
|
||||||
|
|
||||||
|
### 4. Test before pushing
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pytest tests/
|
||||||
|
black .
|
||||||
|
ruff check . --fix
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Push and create PR
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git push origin feat/your-feature-name
|
||||||
|
```
|
||||||
|
|
||||||
|
Then on Forgejo: open a PR, request review, wait for CI/tests to pass.
|
||||||
|
|
||||||
|
Once approved:
|
||||||
|
- Merge via Forgejo UI (not locally)
|
||||||
|
- Delete remote branch via Forgejo
|
||||||
|
- Locally: `git checkout main && git pull --ff-only && git branch -d feat/your-feature-name`
|
||||||
|
|
||||||
|
## Debugging
|
||||||
|
|
||||||
|
### Viewing trace logs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Human-readable trace
|
||||||
|
marchwarden replay <trace_id>
|
||||||
|
|
||||||
|
# Raw JSON
|
||||||
|
cat ~/.marchwarden/traces/<trace_id>.jsonl | jq .
|
||||||
|
|
||||||
|
# Pretty-print all lines
|
||||||
|
cat ~/.marchwarden/traces/<trace_id>.jsonl | jq . -s
|
||||||
|
```
|
||||||
|
|
||||||
|
### Debug logging
|
||||||
|
|
||||||
|
Set `MARCHWARDEN_DEBUG=1` for verbose logs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
MARCHWARDEN_DEBUG=1 marchwarden ask "What is X?"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Interactive testing
|
||||||
|
|
||||||
|
Use Python REPL:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python
|
||||||
|
>>> from researchers.web import WebResearcher
|
||||||
|
>>> researcher = WebResearcher()
|
||||||
|
>>> result = researcher.research("What is X?")
|
||||||
|
>>> print(result.answer)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Tasks
|
||||||
|
|
||||||
|
### Adding a new tool to the researcher
|
||||||
|
|
||||||
|
1. Define the tool in `researchers/web/tools.py`
|
||||||
|
2. Register it in the agent's tool list (`researchers/web/agent.py`)
|
||||||
|
3. Add test coverage in `tests/test_web_researcher.py`
|
||||||
|
4. Update docs if it changes the contract
|
||||||
|
|
||||||
|
### Changing the research contract
|
||||||
|
|
||||||
|
If you need to modify the `research()` signature:
|
||||||
|
|
||||||
|
1. Update `researchers/web/models.py` (ResearchResult, Citation, etc)
|
||||||
|
2. Update `researchers/web/agent.py` to produce the new fields
|
||||||
|
3. Update `docs/wiki/ResearchContract.md`
|
||||||
|
4. Add a migration guide if breaking
|
||||||
|
5. Tests must pass with new signature
|
||||||
|
|
||||||
|
### Running cost analysis
|
||||||
|
|
||||||
|
See how much a research call costs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
marchwarden ask "Q" --verbose
|
||||||
|
# Shows: tokens_used, iterations_run, wall_time_sec
|
||||||
|
```
|
||||||
|
|
||||||
|
For batch analysis:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import json
|
||||||
|
import glob
|
||||||
|
for trace_file in glob.glob("~/.marchwarden/traces/*.jsonl"):
|
||||||
|
for line in open(trace_file):
|
||||||
|
event = json.loads(line)
|
||||||
|
# Analyze cost_metadata
|
||||||
|
```
|
||||||
|
|
||||||
|
## FAQ
|
||||||
|
|
||||||
|
**Q: How do I add a new researcher?**
|
||||||
|
A: Create `researchers/new_source/` with the same structure as `researchers/web/`. Implement `research()`, expose it as an MCP server. Test with the CLI.
|
||||||
|
|
||||||
|
**Q: Do I need to handle Tavily failures?**
|
||||||
|
A: Yes. Catch `TavilyError` and fall back to what you have. Document in `gaps`.
|
||||||
|
|
||||||
|
**Q: What if Anthropic API goes down?**
|
||||||
|
A: The agent will fail. Retry logic TBD. For now, it's a blocker.
|
||||||
|
|
||||||
|
**Q: How do I deploy this?**
|
||||||
|
A: V1 is CLI-only, local use only. V2 will have a PI orchestrator with real deployment needs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
See also: [Architecture.md](Architecture.md), [ResearchContract.md](ResearchContract.md), [../CONTRIBUTING.md](../CONTRIBUTING.md)
|
||||||
352
ResearchContract.md
Normal file
352
ResearchContract.md
Normal file
|
|
@ -0,0 +1,352 @@
|
||||||
|
# Research Contract
|
||||||
|
|
||||||
|
This document defines the `research()` tool that all Marchwarden researchers implement. It is the stable contract between a researcher MCP server and its caller (the PI or CLI).
|
||||||
|
|
||||||
|
## Tool Signature
|
||||||
|
|
||||||
|
```python
|
||||||
|
async def research(
|
||||||
|
question: str,
|
||||||
|
context: Optional[str] = None,
|
||||||
|
depth: Literal["shallow", "balanced", "deep"] = "balanced",
|
||||||
|
constraints: Optional[ResearchConstraints] = None,
|
||||||
|
) -> ResearchResult
|
||||||
|
```
|
||||||
|
|
||||||
|
### Input Parameters
|
||||||
|
|
||||||
|
#### `question` (required, string)
|
||||||
|
The question the researcher is asked to investigate. Examples:
|
||||||
|
- "What are ideal crops for a garden in Utah?"
|
||||||
|
- "Summarize recent developments in transformer architectures"
|
||||||
|
- "What is the legal status of AI in France?"
|
||||||
|
|
||||||
|
Constraints: 1–500 characters, UTF-8 encoded.
|
||||||
|
|
||||||
|
#### `context` (optional, string)
|
||||||
|
What the PI or caller already knows. The researcher uses this to avoid duplicating effort or to refocus. Examples:
|
||||||
|
- "I already know Utah is in USDA zones 3-8. Focus on water requirements."
|
||||||
|
- "I've read the 2024 papers on LoRA. What's new in 2025?"
|
||||||
|
|
||||||
|
Constraints: 0–2000 characters.
|
||||||
|
|
||||||
|
#### `depth` (optional, enum)
|
||||||
|
How thoroughly to research:
|
||||||
|
- `"shallow"` — quick scan, 1–2 iterations, ~5k tokens. For "does this exist?" questions.
|
||||||
|
- `"balanced"` (default) — moderate depth, 2–4 iterations, ~15k tokens. For typical questions.
|
||||||
|
- `"deep"` — thorough investigation, up to 5 iterations, ~25k tokens. For important decisions.
|
||||||
|
|
||||||
|
The researcher uses this as a *hint*, not a strict constraint. The actual depth depends on how much content is available and how confident the researcher becomes.
|
||||||
|
|
||||||
|
#### `constraints` (optional, object)
|
||||||
|
Fine-grained control over researcher behavior:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class ResearchConstraints:
|
||||||
|
max_iterations: int = 5 # Stop after N iterations, regardless
|
||||||
|
token_budget: int = 20000 # Soft limit on tokens; researcher respects
|
||||||
|
max_sources: int = 10 # Max number of sources to fetch
|
||||||
|
source_filter: Optional[str] = None # Only search specific domains (V2)
|
||||||
|
```
|
||||||
|
|
||||||
|
If not provided, defaults are:
|
||||||
|
- `max_iterations`: 5
|
||||||
|
- `token_budget`: 20000 (Sonnet 3.5 equivalent)
|
||||||
|
- `max_sources`: 10
|
||||||
|
|
||||||
|
The MCP server **enforces** these constraints and will stop the researcher if they exceed them.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Output: ResearchResult
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class ResearchResult:
|
||||||
|
answer: str # The synthesized answer
|
||||||
|
citations: List[Citation] # Sources used
|
||||||
|
gaps: List[Gap] # What couldn't be resolved
|
||||||
|
confidence: float # 0.0–1.0 overall confidence
|
||||||
|
cost_metadata: CostMetadata # Resource usage
|
||||||
|
trace_id: str # UUID linking to JSONL trace log
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `answer` (string)
|
||||||
|
The synthesized answer. Should be:
|
||||||
|
- **Grounded** — every claim traces back to a citation
|
||||||
|
- **Humble** — includes caveats and confidence levels
|
||||||
|
- **Actionable** — structured so the caller can use it
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```
|
||||||
|
In Utah (USDA zones 3-8), ideal crops depend on elevation and season:
|
||||||
|
|
||||||
|
High elevation (>7k ft): Short-season crops dominate. Cool-season vegetables
|
||||||
|
(peas, lettuce, potatoes) thrive. Fruit: apples, berries. Summer crops
|
||||||
|
(tomatoes, squash) work in south-facing microclimates.
|
||||||
|
|
||||||
|
Lower elevation: Full range possible. Long growing season supports tomatoes,
|
||||||
|
peppers, squash. Perennials (fruit trees, asparagus) are popular.
|
||||||
|
|
||||||
|
Water is critical: Utah averages 10-20" annual precipitation (dry for vegetable
|
||||||
|
gardening). Most gardeners supplement with irrigation.
|
||||||
|
|
||||||
|
Pests: Japanese beetles (south), aphids (statewide). Deer pressure varies by
|
||||||
|
location.
|
||||||
|
|
||||||
|
See sources below for varietal recommendations by specific county.
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `citations` (list of Citation objects)
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class Citation:
|
||||||
|
source: str # "web", "file", "database", etc
|
||||||
|
locator: str # URL, file path, row ID, or unique identifier
|
||||||
|
title: Optional[str] # Human-readable title (for web)
|
||||||
|
snippet: Optional[str] # Relevant excerpt (50–200 chars)
|
||||||
|
confidence: float # 0.0–1.0: researcher's confidence in this source's accuracy
|
||||||
|
```
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```python
|
||||||
|
Citation(
|
||||||
|
source="web",
|
||||||
|
locator="https://extension.oregonstate.edu/ask-expert/featured/what-are-ideal-garden-crops-utah-zone",
|
||||||
|
title="Oregon State Extension: Ideal Crops for Utah Gardens",
|
||||||
|
snippet="Cool-season crops (peas, lettuce, potatoes) thrive above 7,000 feet. Irrigation essential.",
|
||||||
|
confidence=0.9
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Citations must be:
|
||||||
|
- **Verifiable** — a human can follow the locator and confirm the claim
|
||||||
|
- **Not hallucinated** — the researcher actually read/fetched the source
|
||||||
|
- **Attributed** — each claim in `answer` should link to at least one citation
|
||||||
|
|
||||||
|
#### `gaps` (list of Gap objects)
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class Gap:
|
||||||
|
topic: str # What aspect wasn't resolved
|
||||||
|
reason: str # Why: "no sources found", "contradictory sources", "outside researcher scope"
|
||||||
|
```
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```python
|
||||||
|
[
|
||||||
|
Gap(topic="pest management by county", reason="no county-specific sources found"),
|
||||||
|
Gap(topic="commercial varietals", reason="limited to hobby gardening sources"),
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
Gaps are **critical for the PI**. They tell the orchestrator:
|
||||||
|
- Whether to dispatch a different researcher
|
||||||
|
- Whether to accept partial answers
|
||||||
|
- Which questions remain for human input
|
||||||
|
|
||||||
|
A researcher that admits gaps is more trustworthy than one that fabricates answers.
|
||||||
|
|
||||||
|
#### `confidence` (float, 0.0–1.0)
|
||||||
|
|
||||||
|
Overall confidence in the answer:
|
||||||
|
- `0.9–1.0`: High. All claims grounded in multiple strong sources.
|
||||||
|
- `0.7–0.9`: Moderate. Most claims grounded; some inference; minor contradictions resolved.
|
||||||
|
- `0.5–0.7`: Low. Few direct sources; lots of synthesis; clear gaps.
|
||||||
|
- `< 0.5`: Very low. Mainly inference; major gaps; likely needs human review.
|
||||||
|
|
||||||
|
The PI uses this to decide whether to act on the answer or seek more sources.
|
||||||
|
|
||||||
|
#### `cost_metadata` (object)
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class CostMetadata:
|
||||||
|
tokens_used: int # Total tokens (Claude + Tavily calls)
|
||||||
|
iterations_run: int # Number of inner-loop iterations
|
||||||
|
wall_time_sec: float # Actual elapsed time
|
||||||
|
budget_exhausted: bool # True if researcher hit iteration or token cap
|
||||||
|
```
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```python
|
||||||
|
CostMetadata(
|
||||||
|
tokens_used=8452,
|
||||||
|
iterations_run=3,
|
||||||
|
wall_time_sec=42.5,
|
||||||
|
budget_exhausted=False
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
The PI uses this to:
|
||||||
|
- Track costs (token budgets, actual spend)
|
||||||
|
- Detect runaway loops (budget_exhausted = True)
|
||||||
|
- Plan timeouts (wall_time_sec tells you if this is acceptable latency)
|
||||||
|
|
||||||
|
#### `trace_id` (string, UUID)
|
||||||
|
|
||||||
|
A unique identifier linking to the JSONL trace log:
|
||||||
|
|
||||||
|
```
|
||||||
|
~/.marchwarden/traces/{trace_id}.jsonl
|
||||||
|
```
|
||||||
|
|
||||||
|
The trace contains every decision, search, fetch, parse step for debugging and replay.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Contract Rules
|
||||||
|
|
||||||
|
### The Researcher Must
|
||||||
|
|
||||||
|
1. **Never hallucinate citations.** If a claim isn't in a source, don't cite it.
|
||||||
|
2. **Admit gaps.** If you can't find something, say so. Don't guess.
|
||||||
|
3. **Respect budgets.** Stop iterating if `max_iterations` or `token_budget` is reached. Reflect in `budget_exhausted`.
|
||||||
|
4. **Ground claims.** Every factual claim in `answer` must link to at least one citation.
|
||||||
|
5. **Handle failures gracefully.** If Tavily is down or a URL is broken, note it in `gaps` and continue with what you have.
|
||||||
|
|
||||||
|
### The Caller (PI/CLI) Must
|
||||||
|
|
||||||
|
1. **Accept partial answers.** A researcher that hits its budget but admits gaps is better than one that spins endlessly.
|
||||||
|
2. **Use confidence and gaps.** Don't treat a 0.6 confidence answer the same as a 0.95 confidence answer.
|
||||||
|
3. **Check locators.** For important decisions, verify citations by following the locators.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Example 1: High-Confidence Answer
|
||||||
|
|
||||||
|
Request:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"question": "What is the capital of France?",
|
||||||
|
"depth": "shallow"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"answer": "Paris is the capital of France. It is the country's largest city and serves as the political, cultural, and economic center.",
|
||||||
|
"citations": [
|
||||||
|
{
|
||||||
|
"source": "web",
|
||||||
|
"locator": "https://en.wikipedia.org/wiki/Paris",
|
||||||
|
"title": "Paris - Wikipedia",
|
||||||
|
"snippet": "Paris is the capital and largest city of France",
|
||||||
|
"confidence": 0.99
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"gaps": [],
|
||||||
|
"confidence": 0.99,
|
||||||
|
"cost_metadata": {
|
||||||
|
"tokens_used": 450,
|
||||||
|
"iterations_run": 1,
|
||||||
|
"wall_time_sec": 3.2,
|
||||||
|
"budget_exhausted": false
|
||||||
|
},
|
||||||
|
"trace_id": "550e8400-e29b-41d4-a716-446655440001"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 2: Partial Answer with Gaps
|
||||||
|
|
||||||
|
Request:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"question": "What emerging startups in biotech are working on CRISPR gene therapy?",
|
||||||
|
"depth": "deep"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"answer": "Several emerging startups are advancing CRISPR gene therapy... [detailed answer]",
|
||||||
|
"citations": [
|
||||||
|
{
|
||||||
|
"source": "web",
|
||||||
|
"locator": "https://www.crunchbase.com/...",
|
||||||
|
"title": "Crunchbase: CRISPR Startups",
|
||||||
|
"snippet": "Editas, Beam Therapeutics, and CRISPR Therapeutics...",
|
||||||
|
"confidence": 0.8
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"gaps": [
|
||||||
|
{
|
||||||
|
"topic": "funding rounds in 2026",
|
||||||
|
"reason": "Web sources only go through Q1 2026; may be stale"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"topic": "clinical trial status",
|
||||||
|
"reason": "Requires access to clinical trials database (outside web search scope)"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"confidence": 0.72,
|
||||||
|
"cost_metadata": {
|
||||||
|
"tokens_used": 19240,
|
||||||
|
"iterations_run": 4,
|
||||||
|
"wall_time_sec": 67.8,
|
||||||
|
"budget_exhausted": false
|
||||||
|
},
|
||||||
|
"trace_id": "550e8400-e29b-41d4-a716-446655440002"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 3: Budget Exhausted
|
||||||
|
|
||||||
|
Request:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"question": "Comprehensive history of AI from 1950s to 2026",
|
||||||
|
"depth": "deep",
|
||||||
|
"constraints": {
|
||||||
|
"max_iterations": 3,
|
||||||
|
"token_budget": 5000
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"answer": "The history of AI spans multiple eras... [partial answer, cut off mid-synthesis]",
|
||||||
|
"citations": [
|
||||||
|
{ ... 3-4 citations ... }
|
||||||
|
],
|
||||||
|
"gaps": [
|
||||||
|
{
|
||||||
|
"topic": "detailed timeline 2020-2026",
|
||||||
|
"reason": "budget exhausted before deep synthesis"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"topic": "minor research directions",
|
||||||
|
"reason": "out of scope due to token limit"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"confidence": 0.55,
|
||||||
|
"cost_metadata": {
|
||||||
|
"tokens_used": 4998,
|
||||||
|
"iterations_run": 3,
|
||||||
|
"wall_time_sec": 31.2,
|
||||||
|
"budget_exhausted": true
|
||||||
|
},
|
||||||
|
"trace_id": "550e8400-e29b-41d4-a716-446655440003"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Versioning
|
||||||
|
|
||||||
|
The contract is versioned as `v1`. If breaking changes are needed (e.g., new required fields), the next version becomes `v2` and both can coexist in the network for a transition period.
|
||||||
|
|
||||||
|
Current version: **v1**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
See also: [Architecture.md](Architecture.md), [DevelopmentGuide.md](DevelopmentGuide.md)
|
||||||
Loading…
Reference in a new issue