Issue #46 (Phase A only — Phase B human rating still pending, issue stays open). Adds the data-collection half of the calibration milestone: - scripts/calibration_runner.sh — runs 20 fixed balanced-depth queries across 4 categories (factual, comparative, contradiction-prone, scope-edge), 5 each, capturing per-run logs to docs/stress-tests/M3.3-runs/. - scripts/calibration_collect.py — loads every persisted ResearchResult under ~/.marchwarden/traces/*.result.json and emits a markdown rating worksheet with one row per run. Recovers question text from each trace's start event and category from the run-log filename. - docs/stress-tests/M3.3-rating-worksheet.md — 22 runs (20 calibration + caffeine smoke + M3.2 multi-axis), with empty actual_rating columns for the human-in-the-loop scoring step. - docs/stress-tests/M3.3-runs/*.log — runtime logs from the calibration runner, kept as provenance. Gitignore updated with an exception carving stress-test logs out of the global *.log ignore. Note: M3.1's 4 runs predate #54 (full result persistence) and so are unrecoverable to the worksheet — only post-#54 runs have a result.json sibling. 22 rateable runs is still within the milestone target of 20–30. Phases B (human rating) and C (analysis + rubric + wiki update) follow in a later session. This issue stays open until both are done. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
128 lines
17 KiB
Text
128 lines
17 KiB
Text
Researching: What is the boiling point of liquid nitrogen at standard
|
||
atmospheric pressure?
|
||
|
||
{"question": "What is the boiling point of liquid nitrogen at standard atmospheric pressure?", "depth": "balanced", "max_iterations": null, "token_budget": null, "event": "ask_started", "logger": "marchwarden.cli", "level": "info", "timestamp": "2026-04-09T01:49:07.183443Z"}
|
||
{"transport": "stdio", "server": "marchwarden-web-researcher", "event": "mcp_server_starting", "logger": "marchwarden.mcp", "level": "info", "timestamp": "2026-04-09T01:49:07.993167Z"}
|
||
{"event": "Processing request of type CallToolRequest", "logger": "mcp.server.lowlevel.server", "level": "info", "timestamp": "2026-04-09T01:49:08.002221Z"}
|
||
{"question": "What is the boiling point of liquid nitrogen at standard atmospheric pressure?", "depth": "balanced", "max_iterations": 5, "token_budget": 20000, "model_id": "claude-sonnet-4-6", "event": "research_started", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.web", "level": "info", "timestamp": "2026-04-09T01:49:08.036624Z"}
|
||
{"step": 1, "decision": "Beginning research: depth=balanced", "question": "What is the boiling point of liquid nitrogen at standard atmospheric pressure?", "context": "", "max_iterations": 5, "token_budget": 20000, "event": "start", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:49:08.037079Z"}
|
||
{"step": 2, "decision": "Starting iteration 1/5", "tokens_so_far": 0, "event": "iteration_start", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:49:08.037172Z"}
|
||
{"step": 7, "decision": "Starting iteration 2/5", "tokens_so_far": 1107, "event": "iteration_start", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:49:20.314935Z"}
|
||
{"step": 12, "decision": "Starting iteration 3/5", "tokens_so_far": 5768, "event": "iteration_start", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:49:25.184914Z"}
|
||
{"step": 15, "decision": "Starting iteration 4/5", "tokens_so_far": 16093, "event": "iteration_start", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:49:27.276067Z"}
|
||
{"step": 17, "decision": "Beginning synthesis of gathered evidence", "evidence_count": 17, "iterations_run": 4, "tokens_used": 29376, "event": "synthesis_start", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:49:43.946958Z"}
|
||
{"step": 18, "decision": "Parsed synthesis JSON successfully", "duration_ms": 21492, "event": "synthesis_complete", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:50:05.440080Z"}
|
||
{"step": 26, "decision": "Research complete", "confidence": 0.98, "citation_count": 5, "gap_count": 0, "discovery_count": 2, "total_duration_sec": 59.528, "event": "complete", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:50:05.442761Z"}
|
||
{"confidence": 0.98, "citations": 5, "gaps": 0, "discovery_events": 2, "tokens_used": 42473, "iterations_run": 4, "wall_time_sec": 57.403085231781006, "budget_exhausted": false, "event": "research_completed", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.web", "level": "info", "timestamp": "2026-04-09T01:50:05.442894Z"}
|
||
{"error": "[Errno 13] Permission denied: '/home/micro/.marchwarden/costs.jsonl'", "event": "cost_ledger_write_failed", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.web", "level": "warning", "timestamp": "2026-04-09T01:50:05.443791Z"}
|
||
{"event": "Processing request of type ListToolsRequest", "logger": "mcp.server.lowlevel.server", "level": "info", "timestamp": "2026-04-09T01:50:05.453034Z"}
|
||
{"trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "confidence": 0.98, "citations": 5, "tokens_used": 42473, "wall_time_sec": 57.403085231781006, "event": "ask_completed", "logger": "marchwarden.cli", "level": "info", "timestamp": "2026-04-09T01:50:05.720817Z"}
|
||
╭─────────────────────────────────── Answer ───────────────────────────────────╮
|
||
│ The boiling point of liquid nitrogen at standard atmospheric pressure (1 atm │
|
||
│ / 14.7 psia / 760 mmHg) is −195.79 °C (77 K; −320 °F). Some sources round │
|
||
│ this to −195.8 °C or approximately −196 °C. This value represents the │
|
||
│ temperature at which nitrogen transitions from its liquid phase to a gas │
|
||
│ phase under normal atmospheric conditions. │
|
||
╰──────────────────────────────────────────────────────────────────────────────╯
|
||
Citations
|
||
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
|
||
┃ # ┃ Title / Locator ┃ Excerpt ┃ Conf ┃
|
||
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
|
||
│ 1 │ Liquid Nitrogen Temperature │ The temperature of liquid │ 0.98 │
|
||
│ │ and Facts │ nitrogen is −195.79 °C (77 K; │ │
|
||
│ │ https://sciencenotes.org/liqu │ −320 °F). This is the boiling │ │
|
||
│ │ id-nitrogen-temperature-and-f │ point of nitrogen. However, │ │
|
||
│ │ acts/ │ nitrogen can exist as a liquid │ │
|
||
│ │ │ between 63 K and 77.2 K │ │
|
||
│ │ │ (-346°F and -320.44°F). │ │
|
||
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
|
||
│ 2 │ Nitrogen - Thermophysical │ Boiling Point - at saturation │ 0.97 │
|
||
│ │ Properties │ pressure 14.7 psia and 760 mm │ │
|
||
│ │ https://www.engineeringtoolbo │ Hg - ( o F, o C ) -320.4, │ │
|
||
│ │ x.com/nitrogen-d_1421.html │ -195.8 │ │
|
||
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
|
||
│ 3 │ What Is the Temperature of │ The temperature of liquid │ 0.95 │
|
||
│ │ Liquid Nitrogen? - WestAir │ nitrogen is -196°C (-321°F) at │ │
|
||
│ │ https://westairgases.com/blog │ its boiling point. The liquid │ │
|
||
│ │ /liquid-nitrogen-temperature- │ nitrogen temperature range │ │
|
||
│ │ properties/ │ spans between -210°C (freezing │ │
|
||
│ │ │ point) and -196°C (boiling │ │
|
||
│ │ │ point). │ │
|
||
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
|
||
│ 4 │ What is the boiling point of │ At 1 atmosphere of pressure, │ 0.90 │
|
||
│ │ liquid nitrogen? Does it │ nitrogen boils at -195.8 │ │
|
||
│ │ change ... - Quora │ Celsius (-320.4 Fahrenheit). │ │
|
||
│ │ https://www.quora.com/What-is │ Of course, like any substance, │ │
|
||
│ │ -the-boiling-point-of-liquid- │ boiling point varies directly │ │
|
||
│ │ nitrogen-Does-it-change-in-a- │ with pressure. │ │
|
||
│ │ vacuum-or-at-standard-conditi │ │ │
|
||
│ │ ons │ │ │
|
||
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
|
||
│ 5 │ The boiling point for liquid │ The boiling point for liquid │ 0.88 │
|
||
│ │ nitrogen at atmospheric │ nitrogen at atmospheric │ │
|
||
│ │ pressure is 77 K. │ pressure is 77 K. In an open │ │
|
||
│ │ https://brainly.com/question/ │ container, liquid nitrogen's │ │
|
||
│ │ 17018364 │ temperature is generally │ │
|
||
│ │ │ around its boiling point of 77 │ │
|
||
│ │ │ K due to continuous │ │
|
||
│ │ │ vaporization. │ │
|
||
└─────┴───────────────────────────────┴────────────────────────────────┴───────┘
|
||
Discovery Events
|
||
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
|
||
┃ ┃ Suggested ┃ ┃ ┃
|
||
┃ Type ┃ Researcher ┃ Query ┃ Reason ┃
|
||
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
|
||
│ related_research │ database │ liquid nitrogen │ The boiling point │
|
||
│ │ │ boiling point │ of nitrogen │
|
||
│ │ │ pressure │ varies with │
|
||
│ │ │ dependence phase │ pressure; │
|
||
│ │ │ diagram │ understanding │
|
||
│ │ │ │ this relationship │
|
||
│ │ │ │ is useful for │
|
||
│ │ │ │ industrial and │
|
||
│ │ │ │ scientific │
|
||
│ │ │ │ applications. │
|
||
├──────────────────┼───────────────────┼───────────────────┼───────────────────┤
|
||
│ related_research │ database │ nitrogen phase │ Engineering │
|
||
│ │ │ diagram triple │ ToolBox │
|
||
│ │ │ point critical │ references a │
|
||
│ │ │ point │ nitrogen phase │
|
||
│ │ │ │ diagram showing │
|
||
│ │ │ │ conditions for │
|
||
│ │ │ │ solid, liquid, │
|
||
│ │ │ │ and gas phases. │
|
||
└──────────────────┴───────────────────┴───────────────────┴───────────────────┘
|
||
Open Questions
|
||
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
|
||
┃ Priority ┃ Question ┃ Context ┃
|
||
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
|
||
│ medium │ How does the boiling point of │ Multiple sources note that │
|
||
│ │ liquid nitrogen change as │ boiling point varies directly │
|
||
│ │ pressure decreases toward a │ with pressure, suggesting │
|
||
│ │ vacuum? │ significant changes under │
|
||
│ │ │ reduced pressure conditions. │
|
||
├──────────┼─────────────────────────────────┼─────────────────────────────────┤
|
||
│ low │ What is the exact triple point │ Sources mention nitrogen exists │
|
||
│ │ temperature and pressure for │ as a liquid between 63 K and │
|
||
│ │ nitrogen? │ 77.2 K, implying a triple point │
|
||
│ │ │ near 63 K, but exact triple │
|
||
│ │ │ point data was not provided in │
|
||
│ │ │ the gathered evidence. │
|
||
└──────────┴─────────────────────────────────┴─────────────────────────────────┘
|
||
╭───────────────────────────────── Confidence ─────────────────────────────────╮
|
||
│ Overall: 0.98 │
|
||
│ Corroborating sources: 5 │
|
||
│ Source authority: high │
|
||
│ Contradiction detected: False │
|
||
│ Query specificity match: 1.00 │
|
||
│ Budget status: under cap │
|
||
│ Recency: current │
|
||
╰──────────────────────────────────────────────────────────────────────────────╯
|
||
╭──────────────────────────────────── Cost ────────────────────────────────────╮
|
||
│ Tokens: 42473 │
|
||
│ Iterations: 4 │
|
||
│ Wall time: 57.40s │
|
||
│ Model: claude-sonnet-4-6 │
|
||
╰──────────────────────────────────────────────────────────────────────────────╯
|
||
|
||
trace_id: 6141a021-4a47-45df-aa0c-5acd1db78b79
|