marchwarden/docs/stress-tests/M3.3-runs/01-factual.log
Jeff Smith 13215d7ddb docs(stress-tests): M3.3 Phase A — calibration data collection
Issue #46 (Phase A only — Phase B human rating still pending, issue stays open).

Adds the data-collection half of the calibration milestone:

- scripts/calibration_runner.sh — runs 20 fixed balanced-depth queries
  across 4 categories (factual, comparative, contradiction-prone,
  scope-edge), 5 each, capturing per-run logs to docs/stress-tests/M3.3-runs/.
- scripts/calibration_collect.py — loads every persisted ResearchResult
  under ~/.marchwarden/traces/*.result.json and emits a markdown rating
  worksheet with one row per run. Recovers question text from each
  trace's start event and category from the run-log filename.
- docs/stress-tests/M3.3-rating-worksheet.md — 22 runs (20 calibration
  + caffeine smoke + M3.2 multi-axis), with empty actual_rating columns
  for the human-in-the-loop scoring step.
- docs/stress-tests/M3.3-runs/*.log — runtime logs from the calibration
  runner, kept as provenance. Gitignore updated with an exception
  carving stress-test logs out of the global *.log ignore.

Note: M3.1's 4 runs predate #54 (full result persistence) and so are
unrecoverable to the worksheet — only post-#54 runs have a result.json
sibling. 22 rateable runs is still within the milestone target of 20–30.

Phases B (human rating) and C (analysis + rubric + wiki update) follow
in a later session. This issue stays open until both are done.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:21:47 -06:00

128 lines
17 KiB
Text
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Researching: What is the boiling point of liquid nitrogen at standard
atmospheric pressure?
{"question": "What is the boiling point of liquid nitrogen at standard atmospheric pressure?", "depth": "balanced", "max_iterations": null, "token_budget": null, "event": "ask_started", "logger": "marchwarden.cli", "level": "info", "timestamp": "2026-04-09T01:49:07.183443Z"}
{"transport": "stdio", "server": "marchwarden-web-researcher", "event": "mcp_server_starting", "logger": "marchwarden.mcp", "level": "info", "timestamp": "2026-04-09T01:49:07.993167Z"}
{"event": "Processing request of type CallToolRequest", "logger": "mcp.server.lowlevel.server", "level": "info", "timestamp": "2026-04-09T01:49:08.002221Z"}
{"question": "What is the boiling point of liquid nitrogen at standard atmospheric pressure?", "depth": "balanced", "max_iterations": 5, "token_budget": 20000, "model_id": "claude-sonnet-4-6", "event": "research_started", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.web", "level": "info", "timestamp": "2026-04-09T01:49:08.036624Z"}
{"step": 1, "decision": "Beginning research: depth=balanced", "question": "What is the boiling point of liquid nitrogen at standard atmospheric pressure?", "context": "", "max_iterations": 5, "token_budget": 20000, "event": "start", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:49:08.037079Z"}
{"step": 2, "decision": "Starting iteration 1/5", "tokens_so_far": 0, "event": "iteration_start", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:49:08.037172Z"}
{"step": 7, "decision": "Starting iteration 2/5", "tokens_so_far": 1107, "event": "iteration_start", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:49:20.314935Z"}
{"step": 12, "decision": "Starting iteration 3/5", "tokens_so_far": 5768, "event": "iteration_start", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:49:25.184914Z"}
{"step": 15, "decision": "Starting iteration 4/5", "tokens_so_far": 16093, "event": "iteration_start", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:49:27.276067Z"}
{"step": 17, "decision": "Beginning synthesis of gathered evidence", "evidence_count": 17, "iterations_run": 4, "tokens_used": 29376, "event": "synthesis_start", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:49:43.946958Z"}
{"step": 18, "decision": "Parsed synthesis JSON successfully", "duration_ms": 21492, "event": "synthesis_complete", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:50:05.440080Z"}
{"step": 26, "decision": "Research complete", "confidence": 0.98, "citation_count": 5, "gap_count": 0, "discovery_count": 2, "total_duration_sec": 59.528, "event": "complete", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:50:05.442761Z"}
{"confidence": 0.98, "citations": 5, "gaps": 0, "discovery_events": 2, "tokens_used": 42473, "iterations_run": 4, "wall_time_sec": 57.403085231781006, "budget_exhausted": false, "event": "research_completed", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.web", "level": "info", "timestamp": "2026-04-09T01:50:05.442894Z"}
{"error": "[Errno 13] Permission denied: '/home/micro/.marchwarden/costs.jsonl'", "event": "cost_ledger_write_failed", "trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "researcher": "web", "logger": "marchwarden.researcher.web", "level": "warning", "timestamp": "2026-04-09T01:50:05.443791Z"}
{"event": "Processing request of type ListToolsRequest", "logger": "mcp.server.lowlevel.server", "level": "info", "timestamp": "2026-04-09T01:50:05.453034Z"}
{"trace_id": "6141a021-4a47-45df-aa0c-5acd1db78b79", "confidence": 0.98, "citations": 5, "tokens_used": 42473, "wall_time_sec": 57.403085231781006, "event": "ask_completed", "logger": "marchwarden.cli", "level": "info", "timestamp": "2026-04-09T01:50:05.720817Z"}
╭─────────────────────────────────── Answer ───────────────────────────────────╮
│ The boiling point of liquid nitrogen at standard atmospheric pressure (1 atm │
│ / 14.7 psia / 760 mmHg) is 195.79 °C (77 K; 320 °F). Some sources round │
│ this to 195.8 °C or approximately 196 °C. This value represents the │
│ temperature at which nitrogen transitions from its liquid phase to a gas │
│ phase under normal atmospheric conditions. │
╰──────────────────────────────────────────────────────────────────────────────╯
Citations
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ # ┃ Title / Locator ┃ Excerpt ┃ Conf ┃
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ 1 │ Liquid Nitrogen Temperature │ The temperature of liquid │ 0.98 │
│ │ and Facts │ nitrogen is 195.79 °C (77 K; │ │
│ │ https://sciencenotes.org/liqu │ 320 °F). This is the boiling │ │
│ │ id-nitrogen-temperature-and-f │ point of nitrogen. However, │ │
│ │ acts/ │ nitrogen can exist as a liquid │ │
│ │ │ between 63 K and 77.2 K │ │
│ │ │ (-346°F and -320.44°F). │ │
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
│ 2 │ Nitrogen - Thermophysical │ Boiling Point - at saturation │ 0.97 │
│ │ Properties │ pressure 14.7 psia and 760 mm │ │
│ │ https://www.engineeringtoolbo │ Hg - ( o F, o C ) -320.4, │ │
│ │ x.com/nitrogen-d_1421.html │ -195.8 │ │
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
│ 3 │ What Is the Temperature of │ The temperature of liquid │ 0.95 │
│ │ Liquid Nitrogen? - WestAir │ nitrogen is -196°C (-321°F) at │ │
│ │ https://westairgases.com/blog │ its boiling point. The liquid │ │
│ │ /liquid-nitrogen-temperature- │ nitrogen temperature range │ │
│ │ properties/ │ spans between -210°C (freezing │ │
│ │ │ point) and -196°C (boiling │ │
│ │ │ point). │ │
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
│ 4 │ What is the boiling point of │ At 1 atmosphere of pressure, │ 0.90 │
│ │ liquid nitrogen? Does it │ nitrogen boils at -195.8 │ │
│ │ change ... - Quora │ Celsius (-320.4 Fahrenheit). │ │
│ │ https://www.quora.com/What-is │ Of course, like any substance, │ │
│ │ -the-boiling-point-of-liquid- │ boiling point varies directly │ │
│ │ nitrogen-Does-it-change-in-a- │ with pressure. │ │
│ │ vacuum-or-at-standard-conditi │ │ │
│ │ ons │ │ │
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
│ 5 │ The boiling point for liquid │ The boiling point for liquid │ 0.88 │
│ │ nitrogen at atmospheric │ nitrogen at atmospheric │ │
│ │ pressure is 77 K. │ pressure is 77 K. In an open │ │
│ │ https://brainly.com/question/ │ container, liquid nitrogen's │ │
│ │ 17018364 │ temperature is generally │ │
│ │ │ around its boiling point of 77 │ │
│ │ │ K due to continuous │ │
│ │ │ vaporization. │ │
└─────┴───────────────────────────────┴────────────────────────────────┴───────┘
Discovery Events
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ ┃ Suggested ┃ ┃ ┃
┃ Type ┃ Researcher ┃ Query ┃ Reason ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ related_research │ database │ liquid nitrogen │ The boiling point │
│ │ │ boiling point │ of nitrogen │
│ │ │ pressure │ varies with │
│ │ │ dependence phase │ pressure; │
│ │ │ diagram │ understanding │
│ │ │ │ this relationship │
│ │ │ │ is useful for │
│ │ │ │ industrial and │
│ │ │ │ scientific │
│ │ │ │ applications. │
├──────────────────┼───────────────────┼───────────────────┼───────────────────┤
│ related_research │ database │ nitrogen phase │ Engineering │
│ │ │ diagram triple │ ToolBox │
│ │ │ point critical │ references a │
│ │ │ point │ nitrogen phase │
│ │ │ │ diagram showing │
│ │ │ │ conditions for │
│ │ │ │ solid, liquid, │
│ │ │ │ and gas phases. │
└──────────────────┴───────────────────┴───────────────────┴───────────────────┘
Open Questions
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Priority ┃ Question ┃ Context ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ medium │ How does the boiling point of │ Multiple sources note that │
│ │ liquid nitrogen change as │ boiling point varies directly │
│ │ pressure decreases toward a │ with pressure, suggesting │
│ │ vacuum? │ significant changes under │
│ │ │ reduced pressure conditions. │
├──────────┼─────────────────────────────────┼─────────────────────────────────┤
│ low │ What is the exact triple point │ Sources mention nitrogen exists │
│ │ temperature and pressure for │ as a liquid between 63 K and │
│ │ nitrogen? │ 77.2 K, implying a triple point │
│ │ │ near 63 K, but exact triple │
│ │ │ point data was not provided in │
│ │ │ the gathered evidence. │
└──────────┴─────────────────────────────────┴─────────────────────────────────┘
╭───────────────────────────────── Confidence ─────────────────────────────────╮
│ Overall: 0.98 │
│ Corroborating sources: 5 │
│ Source authority: high │
│ Contradiction detected: False │
│ Query specificity match: 1.00 │
│ Budget status: under cap │
│ Recency: current │
╰──────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────── Cost ────────────────────────────────────╮
│ Tokens: 42473 │
│ Iterations: 4 │
│ Wall time: 57.40s │
│ Model: claude-sonnet-4-6 │
╰──────────────────────────────────────────────────────────────────────────────╯
trace_id: 6141a021-4a47-45df-aa0c-5acd1db78b79