Initial wiki: Architecture, ResearchContract, DevelopmentGuide
- Architecture: system overview, component design, data flow - ResearchContract: complete tool specification with examples - DevelopmentGuide: setup, testing, workflow, debugging Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
commit
a349d6f970
3 changed files with 786 additions and 0 deletions
175
Architecture.md
Normal file
175
Architecture.md
Normal file
|
|
@ -0,0 +1,175 @@
|
|||
# Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
Marchwarden is a network of agentic researchers coordinated by a principal investigator (PI). Each researcher is specialized, autonomous, and fault-tolerant. The PI dispatches researchers to answer questions, waits for results, and synthesizes across responses.
|
||||
|
||||
```
|
||||
┌─────────────┐
|
||||
│ PI Agent │ Orchestrates, synthesizes, decides what to research
|
||||
└──────┬──────┘
|
||||
│ dispatch research(question)
|
||||
│
|
||||
┌────┴──────────────────────────┐
|
||||
│ │
|
||||
┌─┴────────────────────┐ ┌───────┴─────────────────┐
|
||||
│ Web Researcher (MCP) │ │ Future: DB, Arxiv, etc. │
|
||||
│ - Search (Tavily) │ │ (V2+) │
|
||||
│ - Fetch URLs │ │ │
|
||||
│ - Internal loop │ │ │
|
||||
│ - Return citations │ │ │
|
||||
└──────────────────────┘ └─────────────────────────┘
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
### Researchers (MCP servers)
|
||||
|
||||
Each researcher is a **standalone MCP server** that:
|
||||
- Exposes a single tool: `research(question, context, depth, constraints)`
|
||||
- Runs an internal agentic loop (plan → search → fetch → iterate → synthesize)
|
||||
- Returns structured data: `answer`, `citations`, `gaps`, `cost_metadata`, `trace_id`
|
||||
- Enforces budgets: iteration cap and token limit
|
||||
- Logs all internal steps to JSONL trace files
|
||||
|
||||
**V1 researcher**: Web search + fetch
|
||||
- Uses Tavily for searching
|
||||
- Fetches full text from URLs
|
||||
- Iterates up to 5 times or until budget exhausted
|
||||
|
||||
**Future researchers** (V2+): Database, Arxiv, internal documents, etc.
|
||||
|
||||
### MCP Protocol
|
||||
|
||||
Marchwarden uses the **Model Context Protocol (MCP)** as the boundary between researchers and their callers. This gives us:
|
||||
|
||||
- **Language agnostic** — researchers can be Python, Node, Go, etc.
|
||||
- **Process isolation** — researcher crash doesn't crash the PI
|
||||
- **Clean contract** — one tool signature, versioned independently
|
||||
- **Parallel dispatch** — PI can await multiple researchers simultaneously
|
||||
|
||||
### CLI Shim
|
||||
|
||||
For V1, the CLI is the test harness that stands in for the PI:
|
||||
|
||||
```bash
|
||||
marchwarden ask "what are ideal crops for Utah?"
|
||||
marchwarden replay <trace_id>
|
||||
```
|
||||
|
||||
In V2, the CLI is replaced by a full PI orchestrator agent.
|
||||
|
||||
### Trace Logging
|
||||
|
||||
Every research call produces a **JSONL trace log**:
|
||||
|
||||
```
|
||||
~/.marchwarden/traces/{trace_id}.jsonl
|
||||
```
|
||||
|
||||
Each line is a JSON object:
|
||||
```json
|
||||
{
|
||||
"step": 1,
|
||||
"action": "search",
|
||||
"query": "Utah climate gardening",
|
||||
"result": {...},
|
||||
"timestamp": "2026-04-08T12:00:00Z",
|
||||
"decision": "query was relevant, fetching top 3 URLs"
|
||||
}
|
||||
```
|
||||
|
||||
Traces support:
|
||||
- **Debugging** — see exactly what the researcher did
|
||||
- **Replay** — re-run a past session, same results
|
||||
- **Eval** — audit decision-making
|
||||
|
||||
## Data Flow
|
||||
|
||||
### One research call (simplified)
|
||||
|
||||
```
|
||||
CLI: ask "What are ideal crops for Utah?"
|
||||
↓
|
||||
MCP: research(question="What are ideal crops for Utah?", ...)
|
||||
↓
|
||||
Researcher agent loop:
|
||||
1. Plan: "I need climate data for Utah + crop requirements"
|
||||
2. Search: Tavily query for "Utah climate zones crops"
|
||||
3. Fetch: Read top 3 URLs
|
||||
4. Parse: Extract relevant info
|
||||
5. Synthesize: "Based on X sources, ideal crops are Y"
|
||||
6. Check gaps: "Couldn't find pest info"
|
||||
7. Return if confident, else iterate
|
||||
↓
|
||||
Response:
|
||||
{
|
||||
"answer": "...",
|
||||
"citations": [
|
||||
{"source": "web", "locator": "https://...", "snippet": "...", "confidence": 0.95},
|
||||
...
|
||||
],
|
||||
"gaps": [
|
||||
{"topic": "pest resistance", "reason": "no sources found"},
|
||||
],
|
||||
"cost_metadata": {
|
||||
"tokens_used": 8452,
|
||||
"iterations_run": 3,
|
||||
"wall_time_sec": 42.5
|
||||
},
|
||||
"trace_id": "uuid-1234"
|
||||
}
|
||||
↓
|
||||
CLI: Print answer + citations, save trace
|
||||
```
|
||||
|
||||
## Contract Versioning
|
||||
|
||||
The `research()` tool signature is the stable contract. Changes to the contract require explicit versioning so that:
|
||||
- Multiple researchers with different versions can coexist
|
||||
- The PI knows what version it's calling
|
||||
- Backwards compatibility (or breaking changes) is explicit
|
||||
|
||||
See [ResearchContract.md](ResearchContract.md) for the full spec.
|
||||
|
||||
## Future: The PI Agent
|
||||
|
||||
V2 will introduce the orchestrator:
|
||||
|
||||
```python
|
||||
class PIAgent:
|
||||
def research_topic(self, question: str) -> Answer:
|
||||
# Dispatch multiple researchers in parallel
|
||||
web_results = await self.web_researcher.research(question)
|
||||
arxiv_results = await self.arxiv_researcher.research(question)
|
||||
|
||||
# Synthesize
|
||||
return self.synthesize([web_results, arxiv_results])
|
||||
```
|
||||
|
||||
The PI:
|
||||
- Decides which researchers to dispatch
|
||||
- Waits for all responses
|
||||
- Checks for conflicts, gaps, consensus
|
||||
- Synthesizes into a final answer
|
||||
- Can re-dispatch if gaps are critical
|
||||
|
||||
## Assumptions & Constraints
|
||||
|
||||
- **Researchers are honest** — they don't hallucinate citations. If they cite something, it exists in the source.
|
||||
- **Tavily API is available** — for V1 web search. Degradation strategy TBD.
|
||||
- **Token budgets are enforced** — the researcher respects its budget; the MCP server enforces it at the process level.
|
||||
- **Traces are ephemeral** — stored locally for debugging, not synced to a database yet.
|
||||
- **No multi-user** — single-user CLI for V1.
|
||||
|
||||
## Terminology
|
||||
|
||||
- **Researcher**: An agentic system specialized in a domain or source type
|
||||
- **Marchwarden**: The researcher metaphor — stationed at the frontier, reporting back
|
||||
- **Rihla**: (V2+) A unit of research work dispatched by the PI; one researcher's journey to answer a question
|
||||
- **Trace**: A JSONL log of all decisions made during one research call
|
||||
- **Gap**: An unresolved aspect of the question; the researcher couldn't find an answer
|
||||
|
||||
---
|
||||
|
||||
See also: [ResearchContract.md](ResearchContract.md), [DevelopmentGuide.md](DevelopmentGuide.md)
|
||||
259
DevelopmentGuide.md
Normal file
259
DevelopmentGuide.md
Normal file
|
|
@ -0,0 +1,259 @@
|
|||
# Development Guide
|
||||
|
||||
## Setup
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.10+
|
||||
- pip (with venv)
|
||||
- Tavily API key (free tier available at https://tavily.com)
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
git clone https://forgejo.labbity.unbiasedgeek.com/archeious/marchwarden.git
|
||||
cd marchwarden
|
||||
|
||||
# Create virtual environment
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||||
|
||||
# Install in dev mode
|
||||
pip install -e ".[dev]"
|
||||
```
|
||||
|
||||
### Environment Setup
|
||||
|
||||
Create a `.env` file in the project root:
|
||||
|
||||
```env
|
||||
TAVILY_API_KEY=<your-tavily-api-key>
|
||||
ANTHROPIC_API_KEY=<your-claude-api-key>
|
||||
MARCHWARDEN_TRACE_DIR=~/.marchwarden/traces
|
||||
```
|
||||
|
||||
Test that everything works:
|
||||
|
||||
```bash
|
||||
python -c "from anthropic import Anthropic; print('OK')"
|
||||
python -c "from tavily import TavilyClient; print('OK')"
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
marchwarden/
|
||||
├── researchers/
|
||||
│ ├── __init__.py
|
||||
│ └── web/ # V1: Web search researcher
|
||||
│ ├── __init__.py
|
||||
│ ├── server.py # MCP server entry point
|
||||
│ ├── agent.py # Inner research agent
|
||||
│ ├── models.py # Pydantic models (ResearchResult, Citation, etc)
|
||||
│ └── tools.py # Tavily integration, URL fetch
|
||||
├── orchestrator/ # (V2+) PI agent
|
||||
│ ├── __init__.py
|
||||
│ └── pi.py
|
||||
├── cli/ # CLI shim (ask, replay)
|
||||
│ ├── __init__.py
|
||||
│ ├── main.py # Entry point (@click decorators)
|
||||
│ └── formatter.py # Pretty-print results
|
||||
├── tests/
|
||||
│ ├── __init__.py
|
||||
│ ├── test_web_researcher.py
|
||||
│ └── fixtures/
|
||||
├── docs/
|
||||
│ └── wiki/ # You are here
|
||||
├── README.md
|
||||
├── CONTRIBUTING.md
|
||||
├── pyproject.toml
|
||||
└── .gitignore
|
||||
```
|
||||
|
||||
## Running Tests
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
pytest tests/
|
||||
|
||||
# Run with verbose output
|
||||
pytest tests/ -v
|
||||
|
||||
# Run a specific test file
|
||||
pytest tests/test_web_researcher.py
|
||||
|
||||
# Run with coverage
|
||||
pytest --cov=. tests/
|
||||
```
|
||||
|
||||
All tests are unit + integration. We do **not** mock the database or major external services (only Tavily if needed to avoid API costs).
|
||||
|
||||
## Running the CLI
|
||||
|
||||
```bash
|
||||
# Ask a question
|
||||
marchwarden ask "What are ideal crops for a garden in Utah?"
|
||||
|
||||
# With options
|
||||
marchwarden ask "What is X?" --depth deep --budget 25000
|
||||
|
||||
# Replay a trace
|
||||
marchwarden replay <trace_id>
|
||||
|
||||
# Show help
|
||||
marchwarden --help
|
||||
```
|
||||
|
||||
The first run will take a few seconds (agent planning + searches + fetches).
|
||||
|
||||
## Development Workflow
|
||||
|
||||
### 1. Create a branch
|
||||
|
||||
```bash
|
||||
git checkout -b feat/your-feature-name
|
||||
```
|
||||
|
||||
Branch naming: `feat/`, `fix/`, `refactor/`, `chore/` + short description.
|
||||
|
||||
### 2. Make changes
|
||||
|
||||
Edit code, add tests:
|
||||
|
||||
```bash
|
||||
# Run tests as you go
|
||||
pytest tests/test_your_feature.py -v
|
||||
|
||||
# Check formatting
|
||||
black --check .
|
||||
ruff check .
|
||||
|
||||
# Type checking (optional, informational)
|
||||
mypy . --ignore-missing-imports
|
||||
```
|
||||
|
||||
### 3. Commit
|
||||
|
||||
```bash
|
||||
git add <files>
|
||||
git commit -m "Brief imperative description
|
||||
|
||||
- What changed
|
||||
- Why it changed
|
||||
"
|
||||
```
|
||||
|
||||
Commits should be atomic (one logical change per commit).
|
||||
|
||||
### 4. Test before pushing
|
||||
|
||||
```bash
|
||||
pytest tests/
|
||||
black .
|
||||
ruff check . --fix
|
||||
```
|
||||
|
||||
### 5. Push and create PR
|
||||
|
||||
```bash
|
||||
git push origin feat/your-feature-name
|
||||
```
|
||||
|
||||
Then on Forgejo: open a PR, request review, wait for CI/tests to pass.
|
||||
|
||||
Once approved:
|
||||
- Merge via Forgejo UI (not locally)
|
||||
- Delete remote branch via Forgejo
|
||||
- Locally: `git checkout main && git pull --ff-only && git branch -d feat/your-feature-name`
|
||||
|
||||
## Debugging
|
||||
|
||||
### Viewing trace logs
|
||||
|
||||
```bash
|
||||
# Human-readable trace
|
||||
marchwarden replay <trace_id>
|
||||
|
||||
# Raw JSON
|
||||
cat ~/.marchwarden/traces/<trace_id>.jsonl | jq .
|
||||
|
||||
# Pretty-print all lines
|
||||
cat ~/.marchwarden/traces/<trace_id>.jsonl | jq . -s
|
||||
```
|
||||
|
||||
### Debug logging
|
||||
|
||||
Set `MARCHWARDEN_DEBUG=1` for verbose logs:
|
||||
|
||||
```bash
|
||||
MARCHWARDEN_DEBUG=1 marchwarden ask "What is X?"
|
||||
```
|
||||
|
||||
### Interactive testing
|
||||
|
||||
Use Python REPL:
|
||||
|
||||
```bash
|
||||
python
|
||||
>>> from researchers.web import WebResearcher
|
||||
>>> researcher = WebResearcher()
|
||||
>>> result = researcher.research("What is X?")
|
||||
>>> print(result.answer)
|
||||
```
|
||||
|
||||
## Common Tasks
|
||||
|
||||
### Adding a new tool to the researcher
|
||||
|
||||
1. Define the tool in `researchers/web/tools.py`
|
||||
2. Register it in the agent's tool list (`researchers/web/agent.py`)
|
||||
3. Add test coverage in `tests/test_web_researcher.py`
|
||||
4. Update docs if it changes the contract
|
||||
|
||||
### Changing the research contract
|
||||
|
||||
If you need to modify the `research()` signature:
|
||||
|
||||
1. Update `researchers/web/models.py` (ResearchResult, Citation, etc)
|
||||
2. Update `researchers/web/agent.py` to produce the new fields
|
||||
3. Update `docs/wiki/ResearchContract.md`
|
||||
4. Add a migration guide if breaking
|
||||
5. Tests must pass with new signature
|
||||
|
||||
### Running cost analysis
|
||||
|
||||
See how much a research call costs:
|
||||
|
||||
```bash
|
||||
marchwarden ask "Q" --verbose
|
||||
# Shows: tokens_used, iterations_run, wall_time_sec
|
||||
```
|
||||
|
||||
For batch analysis:
|
||||
|
||||
```python
|
||||
import json
|
||||
import glob
|
||||
for trace_file in glob.glob("~/.marchwarden/traces/*.jsonl"):
|
||||
for line in open(trace_file):
|
||||
event = json.loads(line)
|
||||
# Analyze cost_metadata
|
||||
```
|
||||
|
||||
## FAQ
|
||||
|
||||
**Q: How do I add a new researcher?**
|
||||
A: Create `researchers/new_source/` with the same structure as `researchers/web/`. Implement `research()`, expose it as an MCP server. Test with the CLI.
|
||||
|
||||
**Q: Do I need to handle Tavily failures?**
|
||||
A: Yes. Catch `TavilyError` and fall back to what you have. Document in `gaps`.
|
||||
|
||||
**Q: What if Anthropic API goes down?**
|
||||
A: The agent will fail. Retry logic TBD. For now, it's a blocker.
|
||||
|
||||
**Q: How do I deploy this?**
|
||||
A: V1 is CLI-only, local use only. V2 will have a PI orchestrator with real deployment needs.
|
||||
|
||||
---
|
||||
|
||||
See also: [Architecture.md](Architecture.md), [ResearchContract.md](ResearchContract.md), [../CONTRIBUTING.md](../CONTRIBUTING.md)
|
||||
352
ResearchContract.md
Normal file
352
ResearchContract.md
Normal file
|
|
@ -0,0 +1,352 @@
|
|||
# Research Contract
|
||||
|
||||
This document defines the `research()` tool that all Marchwarden researchers implement. It is the stable contract between a researcher MCP server and its caller (the PI or CLI).
|
||||
|
||||
## Tool Signature
|
||||
|
||||
```python
|
||||
async def research(
|
||||
question: str,
|
||||
context: Optional[str] = None,
|
||||
depth: Literal["shallow", "balanced", "deep"] = "balanced",
|
||||
constraints: Optional[ResearchConstraints] = None,
|
||||
) -> ResearchResult
|
||||
```
|
||||
|
||||
### Input Parameters
|
||||
|
||||
#### `question` (required, string)
|
||||
The question the researcher is asked to investigate. Examples:
|
||||
- "What are ideal crops for a garden in Utah?"
|
||||
- "Summarize recent developments in transformer architectures"
|
||||
- "What is the legal status of AI in France?"
|
||||
|
||||
Constraints: 1–500 characters, UTF-8 encoded.
|
||||
|
||||
#### `context` (optional, string)
|
||||
What the PI or caller already knows. The researcher uses this to avoid duplicating effort or to refocus. Examples:
|
||||
- "I already know Utah is in USDA zones 3-8. Focus on water requirements."
|
||||
- "I've read the 2024 papers on LoRA. What's new in 2025?"
|
||||
|
||||
Constraints: 0–2000 characters.
|
||||
|
||||
#### `depth` (optional, enum)
|
||||
How thoroughly to research:
|
||||
- `"shallow"` — quick scan, 1–2 iterations, ~5k tokens. For "does this exist?" questions.
|
||||
- `"balanced"` (default) — moderate depth, 2–4 iterations, ~15k tokens. For typical questions.
|
||||
- `"deep"` — thorough investigation, up to 5 iterations, ~25k tokens. For important decisions.
|
||||
|
||||
The researcher uses this as a *hint*, not a strict constraint. The actual depth depends on how much content is available and how confident the researcher becomes.
|
||||
|
||||
#### `constraints` (optional, object)
|
||||
Fine-grained control over researcher behavior:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ResearchConstraints:
|
||||
max_iterations: int = 5 # Stop after N iterations, regardless
|
||||
token_budget: int = 20000 # Soft limit on tokens; researcher respects
|
||||
max_sources: int = 10 # Max number of sources to fetch
|
||||
source_filter: Optional[str] = None # Only search specific domains (V2)
|
||||
```
|
||||
|
||||
If not provided, defaults are:
|
||||
- `max_iterations`: 5
|
||||
- `token_budget`: 20000 (Sonnet 3.5 equivalent)
|
||||
- `max_sources`: 10
|
||||
|
||||
The MCP server **enforces** these constraints and will stop the researcher if they exceed them.
|
||||
|
||||
---
|
||||
|
||||
### Output: ResearchResult
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ResearchResult:
|
||||
answer: str # The synthesized answer
|
||||
citations: List[Citation] # Sources used
|
||||
gaps: List[Gap] # What couldn't be resolved
|
||||
confidence: float # 0.0–1.0 overall confidence
|
||||
cost_metadata: CostMetadata # Resource usage
|
||||
trace_id: str # UUID linking to JSONL trace log
|
||||
```
|
||||
|
||||
#### `answer` (string)
|
||||
The synthesized answer. Should be:
|
||||
- **Grounded** — every claim traces back to a citation
|
||||
- **Humble** — includes caveats and confidence levels
|
||||
- **Actionable** — structured so the caller can use it
|
||||
|
||||
Example:
|
||||
```
|
||||
In Utah (USDA zones 3-8), ideal crops depend on elevation and season:
|
||||
|
||||
High elevation (>7k ft): Short-season crops dominate. Cool-season vegetables
|
||||
(peas, lettuce, potatoes) thrive. Fruit: apples, berries. Summer crops
|
||||
(tomatoes, squash) work in south-facing microclimates.
|
||||
|
||||
Lower elevation: Full range possible. Long growing season supports tomatoes,
|
||||
peppers, squash. Perennials (fruit trees, asparagus) are popular.
|
||||
|
||||
Water is critical: Utah averages 10-20" annual precipitation (dry for vegetable
|
||||
gardening). Most gardeners supplement with irrigation.
|
||||
|
||||
Pests: Japanese beetles (south), aphids (statewide). Deer pressure varies by
|
||||
location.
|
||||
|
||||
See sources below for varietal recommendations by specific county.
|
||||
```
|
||||
|
||||
#### `citations` (list of Citation objects)
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class Citation:
|
||||
source: str # "web", "file", "database", etc
|
||||
locator: str # URL, file path, row ID, or unique identifier
|
||||
title: Optional[str] # Human-readable title (for web)
|
||||
snippet: Optional[str] # Relevant excerpt (50–200 chars)
|
||||
confidence: float # 0.0–1.0: researcher's confidence in this source's accuracy
|
||||
```
|
||||
|
||||
Example:
|
||||
```python
|
||||
Citation(
|
||||
source="web",
|
||||
locator="https://extension.oregonstate.edu/ask-expert/featured/what-are-ideal-garden-crops-utah-zone",
|
||||
title="Oregon State Extension: Ideal Crops for Utah Gardens",
|
||||
snippet="Cool-season crops (peas, lettuce, potatoes) thrive above 7,000 feet. Irrigation essential.",
|
||||
confidence=0.9
|
||||
)
|
||||
```
|
||||
|
||||
Citations must be:
|
||||
- **Verifiable** — a human can follow the locator and confirm the claim
|
||||
- **Not hallucinated** — the researcher actually read/fetched the source
|
||||
- **Attributed** — each claim in `answer` should link to at least one citation
|
||||
|
||||
#### `gaps` (list of Gap objects)
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class Gap:
|
||||
topic: str # What aspect wasn't resolved
|
||||
reason: str # Why: "no sources found", "contradictory sources", "outside researcher scope"
|
||||
```
|
||||
|
||||
Example:
|
||||
```python
|
||||
[
|
||||
Gap(topic="pest management by county", reason="no county-specific sources found"),
|
||||
Gap(topic="commercial varietals", reason="limited to hobby gardening sources"),
|
||||
]
|
||||
```
|
||||
|
||||
Gaps are **critical for the PI**. They tell the orchestrator:
|
||||
- Whether to dispatch a different researcher
|
||||
- Whether to accept partial answers
|
||||
- Which questions remain for human input
|
||||
|
||||
A researcher that admits gaps is more trustworthy than one that fabricates answers.
|
||||
|
||||
#### `confidence` (float, 0.0–1.0)
|
||||
|
||||
Overall confidence in the answer:
|
||||
- `0.9–1.0`: High. All claims grounded in multiple strong sources.
|
||||
- `0.7–0.9`: Moderate. Most claims grounded; some inference; minor contradictions resolved.
|
||||
- `0.5–0.7`: Low. Few direct sources; lots of synthesis; clear gaps.
|
||||
- `< 0.5`: Very low. Mainly inference; major gaps; likely needs human review.
|
||||
|
||||
The PI uses this to decide whether to act on the answer or seek more sources.
|
||||
|
||||
#### `cost_metadata` (object)
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class CostMetadata:
|
||||
tokens_used: int # Total tokens (Claude + Tavily calls)
|
||||
iterations_run: int # Number of inner-loop iterations
|
||||
wall_time_sec: float # Actual elapsed time
|
||||
budget_exhausted: bool # True if researcher hit iteration or token cap
|
||||
```
|
||||
|
||||
Example:
|
||||
```python
|
||||
CostMetadata(
|
||||
tokens_used=8452,
|
||||
iterations_run=3,
|
||||
wall_time_sec=42.5,
|
||||
budget_exhausted=False
|
||||
)
|
||||
```
|
||||
|
||||
The PI uses this to:
|
||||
- Track costs (token budgets, actual spend)
|
||||
- Detect runaway loops (budget_exhausted = True)
|
||||
- Plan timeouts (wall_time_sec tells you if this is acceptable latency)
|
||||
|
||||
#### `trace_id` (string, UUID)
|
||||
|
||||
A unique identifier linking to the JSONL trace log:
|
||||
|
||||
```
|
||||
~/.marchwarden/traces/{trace_id}.jsonl
|
||||
```
|
||||
|
||||
The trace contains every decision, search, fetch, parse step for debugging and replay.
|
||||
|
||||
---
|
||||
|
||||
## Contract Rules
|
||||
|
||||
### The Researcher Must
|
||||
|
||||
1. **Never hallucinate citations.** If a claim isn't in a source, don't cite it.
|
||||
2. **Admit gaps.** If you can't find something, say so. Don't guess.
|
||||
3. **Respect budgets.** Stop iterating if `max_iterations` or `token_budget` is reached. Reflect in `budget_exhausted`.
|
||||
4. **Ground claims.** Every factual claim in `answer` must link to at least one citation.
|
||||
5. **Handle failures gracefully.** If Tavily is down or a URL is broken, note it in `gaps` and continue with what you have.
|
||||
|
||||
### The Caller (PI/CLI) Must
|
||||
|
||||
1. **Accept partial answers.** A researcher that hits its budget but admits gaps is better than one that spins endlessly.
|
||||
2. **Use confidence and gaps.** Don't treat a 0.6 confidence answer the same as a 0.95 confidence answer.
|
||||
3. **Check locators.** For important decisions, verify citations by following the locators.
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: High-Confidence Answer
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"question": "What is the capital of France?",
|
||||
"depth": "shallow"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"answer": "Paris is the capital of France. It is the country's largest city and serves as the political, cultural, and economic center.",
|
||||
"citations": [
|
||||
{
|
||||
"source": "web",
|
||||
"locator": "https://en.wikipedia.org/wiki/Paris",
|
||||
"title": "Paris - Wikipedia",
|
||||
"snippet": "Paris is the capital and largest city of France",
|
||||
"confidence": 0.99
|
||||
}
|
||||
],
|
||||
"gaps": [],
|
||||
"confidence": 0.99,
|
||||
"cost_metadata": {
|
||||
"tokens_used": 450,
|
||||
"iterations_run": 1,
|
||||
"wall_time_sec": 3.2,
|
||||
"budget_exhausted": false
|
||||
},
|
||||
"trace_id": "550e8400-e29b-41d4-a716-446655440001"
|
||||
}
|
||||
```
|
||||
|
||||
### Example 2: Partial Answer with Gaps
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"question": "What emerging startups in biotech are working on CRISPR gene therapy?",
|
||||
"depth": "deep"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"answer": "Several emerging startups are advancing CRISPR gene therapy... [detailed answer]",
|
||||
"citations": [
|
||||
{
|
||||
"source": "web",
|
||||
"locator": "https://www.crunchbase.com/...",
|
||||
"title": "Crunchbase: CRISPR Startups",
|
||||
"snippet": "Editas, Beam Therapeutics, and CRISPR Therapeutics...",
|
||||
"confidence": 0.8
|
||||
}
|
||||
],
|
||||
"gaps": [
|
||||
{
|
||||
"topic": "funding rounds in 2026",
|
||||
"reason": "Web sources only go through Q1 2026; may be stale"
|
||||
},
|
||||
{
|
||||
"topic": "clinical trial status",
|
||||
"reason": "Requires access to clinical trials database (outside web search scope)"
|
||||
}
|
||||
],
|
||||
"confidence": 0.72,
|
||||
"cost_metadata": {
|
||||
"tokens_used": 19240,
|
||||
"iterations_run": 4,
|
||||
"wall_time_sec": 67.8,
|
||||
"budget_exhausted": false
|
||||
},
|
||||
"trace_id": "550e8400-e29b-41d4-a716-446655440002"
|
||||
}
|
||||
```
|
||||
|
||||
### Example 3: Budget Exhausted
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"question": "Comprehensive history of AI from 1950s to 2026",
|
||||
"depth": "deep",
|
||||
"constraints": {
|
||||
"max_iterations": 3,
|
||||
"token_budget": 5000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"answer": "The history of AI spans multiple eras... [partial answer, cut off mid-synthesis]",
|
||||
"citations": [
|
||||
{ ... 3-4 citations ... }
|
||||
],
|
||||
"gaps": [
|
||||
{
|
||||
"topic": "detailed timeline 2020-2026",
|
||||
"reason": "budget exhausted before deep synthesis"
|
||||
},
|
||||
{
|
||||
"topic": "minor research directions",
|
||||
"reason": "out of scope due to token limit"
|
||||
}
|
||||
],
|
||||
"confidence": 0.55,
|
||||
"cost_metadata": {
|
||||
"tokens_used": 4998,
|
||||
"iterations_run": 3,
|
||||
"wall_time_sec": 31.2,
|
||||
"budget_exhausted": true
|
||||
},
|
||||
"trace_id": "550e8400-e29b-41d4-a716-446655440003"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Versioning
|
||||
|
||||
The contract is versioned as `v1`. If breaking changes are needed (e.g., new required fields), the next version becomes `v2` and both can coexist in the network for a transition period.
|
||||
|
||||
Current version: **v1**
|
||||
|
||||
---
|
||||
|
||||
See also: [Architecture.md](Architecture.md), [DevelopmentGuide.md](DevelopmentGuide.md)
|
||||
Loading…
Reference in a new issue