Issue #46 (Phase A only — Phase B human rating still pending, issue stays open). Adds the data-collection half of the calibration milestone: - scripts/calibration_runner.sh — runs 20 fixed balanced-depth queries across 4 categories (factual, comparative, contradiction-prone, scope-edge), 5 each, capturing per-run logs to docs/stress-tests/M3.3-runs/. - scripts/calibration_collect.py — loads every persisted ResearchResult under ~/.marchwarden/traces/*.result.json and emits a markdown rating worksheet with one row per run. Recovers question text from each trace's start event and category from the run-log filename. - docs/stress-tests/M3.3-rating-worksheet.md — 22 runs (20 calibration + caffeine smoke + M3.2 multi-axis), with empty actual_rating columns for the human-in-the-loop scoring step. - docs/stress-tests/M3.3-runs/*.log — runtime logs from the calibration runner, kept as provenance. Gitignore updated with an exception carving stress-test logs out of the global *.log ignore. Note: M3.1's 4 runs predate #54 (full result persistence) and so are unrecoverable to the worksheet — only post-#54 runs have a result.json sibling. 22 rateable runs is still within the milestone target of 20–30. Phases B (human rating) and C (analysis + rubric + wiki update) follow in a later session. This issue stays open until both are done. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
145 lines
19 KiB
Text
145 lines
19 KiB
Text
Researching: When did the James Webb Space Telescope launch?
|
||
|
||
{"question": "When did the James Webb Space Telescope launch?", "depth": "balanced", "max_iterations": null, "token_budget": null, "event": "ask_started", "logger": "marchwarden.cli", "level": "info", "timestamp": "2026-04-09T01:50:06.289350Z"}
|
||
{"transport": "stdio", "server": "marchwarden-web-researcher", "event": "mcp_server_starting", "logger": "marchwarden.mcp", "level": "info", "timestamp": "2026-04-09T01:50:07.051309Z"}
|
||
{"event": "Processing request of type CallToolRequest", "logger": "mcp.server.lowlevel.server", "level": "info", "timestamp": "2026-04-09T01:50:07.061145Z"}
|
||
{"question": "When did the James Webb Space Telescope launch?", "depth": "balanced", "max_iterations": 5, "token_budget": 20000, "model_id": "claude-sonnet-4-6", "event": "research_started", "trace_id": "91e87d05-6d23-4377-af13-270a8cf701e2", "researcher": "web", "logger": "marchwarden.researcher.web", "level": "info", "timestamp": "2026-04-09T01:50:07.098980Z"}
|
||
{"step": 1, "decision": "Beginning research: depth=balanced", "question": "When did the James Webb Space Telescope launch?", "context": "", "max_iterations": 5, "token_budget": 20000, "event": "start", "trace_id": "91e87d05-6d23-4377-af13-270a8cf701e2", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:50:07.099569Z"}
|
||
{"step": 2, "decision": "Starting iteration 1/5", "tokens_so_far": 0, "event": "iteration_start", "trace_id": "91e87d05-6d23-4377-af13-270a8cf701e2", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:50:07.099732Z"}
|
||
{"step": 5, "decision": "Starting iteration 2/5", "tokens_so_far": 1050, "event": "iteration_start", "trace_id": "91e87d05-6d23-4377-af13-270a8cf701e2", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:50:15.512242Z"}
|
||
{"step": 8, "decision": "Starting iteration 3/5", "tokens_so_far": 5418, "event": "iteration_start", "trace_id": "91e87d05-6d23-4377-af13-270a8cf701e2", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:50:18.749199Z"}
|
||
{"step": 10, "decision": "Beginning synthesis of gathered evidence", "evidence_count": 6, "iterations_run": 3, "tokens_used": 11453, "event": "synthesis_start", "trace_id": "91e87d05-6d23-4377-af13-270a8cf701e2", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:50:28.069780Z"}
|
||
{"step": 11, "decision": "Parsed synthesis JSON successfully", "duration_ms": 24998, "event": "synthesis_complete", "trace_id": "91e87d05-6d23-4377-af13-270a8cf701e2", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:50:51.942803Z"}
|
||
{"step": 20, "decision": "Research complete", "confidence": 0.99, "citation_count": 5, "gap_count": 1, "discovery_count": 2, "total_duration_sec": 47.037, "event": "complete", "trace_id": "91e87d05-6d23-4377-af13-270a8cf701e2", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:50:51.943609Z"}
|
||
{"confidence": 0.99, "citations": 5, "gaps": 1, "discovery_events": 2, "tokens_used": 19708, "iterations_run": 3, "wall_time_sec": 44.843754529953, "budget_exhausted": false, "event": "research_completed", "trace_id": "91e87d05-6d23-4377-af13-270a8cf701e2", "researcher": "web", "logger": "marchwarden.researcher.web", "level": "info", "timestamp": "2026-04-09T01:50:51.943716Z"}
|
||
{"error": "[Errno 13] Permission denied: '/home/micro/.marchwarden/costs.jsonl'", "event": "cost_ledger_write_failed", "trace_id": "91e87d05-6d23-4377-af13-270a8cf701e2", "researcher": "web", "logger": "marchwarden.researcher.web", "level": "warning", "timestamp": "2026-04-09T01:50:51.944100Z"}
|
||
{"event": "Processing request of type ListToolsRequest", "logger": "mcp.server.lowlevel.server", "level": "info", "timestamp": "2026-04-09T01:50:51.947937Z"}
|
||
{"trace_id": "91e87d05-6d23-4377-af13-270a8cf701e2", "confidence": 0.99, "citations": 5, "tokens_used": 19708, "wall_time_sec": 44.843754529953, "event": "ask_completed", "logger": "marchwarden.cli", "level": "info", "timestamp": "2026-04-09T01:50:52.133972Z"}
|
||
╭─────────────────────────────────── Answer ───────────────────────────────────╮
|
||
│ The James Webb Space Telescope (JWST) launched on December 25, 2021, at │
|
||
│ 12:20 UTC (7:20 AM ET) aboard an Arianespace Ariane 5 ECA+ rocket (Flight │
|
||
│ VA256) from the Guiana Space Centre (ELA-3) in Kourou, French Guiana. It │
|
||
│ entered service on July 12, 2022. │
|
||
╰──────────────────────────────────────────────────────────────────────────────╯
|
||
Citations
|
||
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
|
||
┃ # ┃ Title / Locator ┃ Excerpt ┃ Conf ┃
|
||
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
|
||
│ 1 │ James Webb Space Telescope - │ Launch date: 25 December 2021 │ 0.99 │
|
||
│ │ Wikipedia │ (2021-12-25), 12:20 UTC | │ │
|
||
│ │ https://en.wikipedia.org/wiki │ Rocket: Ariane 5 ECA+ (S/N │ │
|
||
│ │ /James_Webb_Space_Telescope │ 5113, Flight VA256) | Launch │ │
|
||
│ │ │ site: Guiana, ELA-3 | │ │
|
||
│ │ │ Contractor: Arianespace | │ │
|
||
│ │ │ Entered service: 12 July 2022 │ │
|
||
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
|
||
│ 2 │ The Launch of the James Webb │ On December 25, 2021, and 7:20 │ 0.98 │
|
||
│ │ Space Telescope - YouTube │ AM ET (12:20 UTC), the James │ │
|
||
│ │ https://www.youtube.com/watch │ Webb Space Telescope was │ │
|
||
│ │ ?v=9tXlqWldVVk │ launched by an ArianeSpace │ │
|
||
│ │ │ Ariane 5 rocket from │ │
|
||
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
|
||
│ 3 │ James Webb Space Telescope │ The launch date was Saturday, │ 0.97 │
|
||
│ │ (JWST) Mission (Ariane 5) - │ December 25, 2021 at 12:20 PM │ │
|
||
│ │ RocketLaunch.Live │ (UTC). │ │
|
||
│ │ https://www.rocketlaunch.live │ │ │
|
||
│ │ /launch/jwst │ │ │
|
||
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
|
||
│ 4 │ James Webb Space Telescope – │ JWST's launch date was │ 0.95 │
|
||
│ │ College of Science │ December 25 from Europe's │ │
|
||
│ │ https://science.utah.edu/news │ Spaceport in Kourou, French │ │
|
||
│ │ /james-webb-space-telescope/ │ Guiana. Longtime fans of the │ │
|
||
│ │ │ telescope are celebrating it │ │
|
||
│ │ │ as a Christmas miracle. │ │
|
||
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
|
||
│ 5 │ NASA's James Webb Space │ Liftoff is at 7:20 a.m. EST │ 0.90 │
|
||
│ │ Telescope officially set to │ (1220 GMT). │ │
|
||
│ │ launch Dec. 24 | Space │ │ │
|
||
│ │ https://www.space.com/james-w │ │ │
|
||
│ │ ebb-space-telescope-launch-da │ │ │
|
||
│ │ te-confirmed │ │ │
|
||
└─────┴───────────────────────────────┴────────────────────────────────┴───────┘
|
||
Gaps
|
||
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
|
||
┃ Category ┃ Topic ┃ Detail ┃
|
||
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
|
||
│ contradictory_sources │ Space.com headline │ The Space.com article │
|
||
│ │ discrepancy │ headline references Dec. │
|
||
│ │ │ 24, which was the │
|
||
│ │ │ announced/planned launch │
|
||
│ │ │ date at time of │
|
||
│ │ │ publication, while the │
|
||
│ │ │ actual launch occurred on │
|
||
│ │ │ Dec. 25, 2021. This is a │
|
||
│ │ │ pre-launch announcement │
|
||
│ │ │ artifact, not a true │
|
||
│ │ │ contradiction, and all │
|
||
│ │ │ other sources confirm │
|
||
│ │ │ Dec. 25. │
|
||
└───────────────────────┴──────────────────────────┴───────────────────────────┘
|
||
Discovery Events
|
||
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
|
||
┃ ┃ Suggested ┃ ┃ ┃
|
||
┃ Type ┃ Researcher ┃ Query ┃ Reason ┃
|
||
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
|
||
│ related_research │ null │ James Webb Space │ JWST entered │
|
||
│ │ │ Telescope first │ service on July │
|
||
│ │ │ science results │ 12, 2022; │
|
||
│ │ │ July 2022 │ understanding its │
|
||
│ │ │ │ early science │
|
||
│ │ │ │ results provides │
|
||
│ │ │ │ context for its │
|
||
│ │ │ │ operational │
|
||
│ │ │ │ impact. │
|
||
├──────────────────┼───────────────────┼───────────────────┼───────────────────┤
|
||
│ related_research │ null │ JWST launch │ The telescope was │
|
||
│ │ │ delays history │ originally │
|
||
│ │ │ original 2007 │ planned to launch │
|
||
│ │ │ launch plan │ in 2007 but faced │
|
||
│ │ │ │ decades of │
|
||
│ │ │ │ delays, making │
|
||
│ │ │ │ the history of │
|
||
│ │ │ │ its development │
|
||
│ │ │ │ noteworthy. │
|
||
└──────────────────┴───────────────────┴───────────────────┴───────────────────┘
|
||
Open Questions
|
||
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
|
||
┃ Priority ┃ Question ┃ Context ┃
|
||
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
|
||
│ medium │ What were the key milestones │ Wikipedia notes the telescope │
|
||
│ │ after JWST's launch during its │ entered service on July 12, │
|
||
│ │ commissioning phase before │ 2022, approximately six months │
|
||
│ │ entering service on July 12, │ after its December 25, 2021 │
|
||
│ │ 2022? │ launch, suggesting a lengthy │
|
||
│ │ │ commissioning process. │
|
||
├──────────┼─────────────────────────────────┼─────────────────────────────────┤
|
||
│ low │ What caused JWST's launch to │ Space.com's article was titled │
|
||
│ │ slip from December 24 to │ with a Dec. 24 launch date, but │
|
||
│ │ December 25, 2021? │ the actual launch occurred on │
|
||
│ │ │ Dec. 25, suggesting a │
|
||
│ │ │ last-minute slip. │
|
||
├──────────┼─────────────────────────────────┼─────────────────────────────────┤
|
||
│ medium │ How does JWST's actual mission │ Wikipedia lists a 10-year │
|
||
│ │ performance compare to its │ planned and 20-year expected │
|
||
│ │ planned 10-year operational │ life; precise launch trajectory │
|
||
│ │ lifespan given its fuel │ reportedly left more fuel than │
|
||
│ │ efficiency during launch? │ expected, potentially extending │
|
||
│ │ │ the mission. │
|
||
└──────────┴─────────────────────────────────┴─────────────────────────────────┘
|
||
╭───────────────────────────────── Confidence ─────────────────────────────────╮
|
||
│ Overall: 0.99 │
|
||
│ Corroborating sources: 5 │
|
||
│ Source authority: high │
|
||
│ Contradiction detected: False │
|
||
│ Query specificity match: 1.00 │
|
||
│ Budget status: under cap │
|
||
│ Recency: current │
|
||
╰──────────────────────────────────────────────────────────────────────────────╯
|
||
╭──────────────────────────────────── Cost ────────────────────────────────────╮
|
||
│ Tokens: 19708 │
|
||
│ Iterations: 3 │
|
||
│ Wall time: 44.84s │
|
||
│ Model: claude-sonnet-4-6 │
|
||
╰──────────────────────────────────────────────────────────────────────────────╯
|
||
|
||
trace_id: 91e87d05-6d23-4377-af13-270a8cf701e2
|