marchwarden/docs/stress-tests/M3.3-runs/06-comparative.log
Jeff Smith 13215d7ddb docs(stress-tests): M3.3 Phase A — calibration data collection
Issue #46 (Phase A only — Phase B human rating still pending, issue stays open).

Adds the data-collection half of the calibration milestone:

- scripts/calibration_runner.sh — runs 20 fixed balanced-depth queries
  across 4 categories (factual, comparative, contradiction-prone,
  scope-edge), 5 each, capturing per-run logs to docs/stress-tests/M3.3-runs/.
- scripts/calibration_collect.py — loads every persisted ResearchResult
  under ~/.marchwarden/traces/*.result.json and emits a markdown rating
  worksheet with one row per run. Recovers question text from each
  trace's start event and category from the run-log filename.
- docs/stress-tests/M3.3-rating-worksheet.md — 22 runs (20 calibration
  + caffeine smoke + M3.2 multi-axis), with empty actual_rating columns
  for the human-in-the-loop scoring step.
- docs/stress-tests/M3.3-runs/*.log — runtime logs from the calibration
  runner, kept as provenance. Gitignore updated with an exception
  carving stress-test logs out of the global *.log ignore.

Note: M3.1's 4 runs predate #54 (full result persistence) and so are
unrecoverable to the worksheet — only post-#54 runs have a result.json
sibling. 22 rateable runs is still within the milestone target of 20–30.

Phases B (human rating) and C (analysis + rubric + wiki update) follow
in a later session. This issue stays open until both are done.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:21:47 -06:00

226 lines
27 KiB
Text
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Researching: Compare the energy density of lithium-ion vs sodium-ion batteries.
{"question": "Compare the energy density of lithium-ion vs sodium-ion batteries.", "depth": "balanced", "max_iterations": null, "token_budget": null, "event": "ask_started", "logger": "marchwarden.cli", "level": "info", "timestamp": "2026-04-09T01:54:02.430608Z"}
{"transport": "stdio", "server": "marchwarden-web-researcher", "event": "mcp_server_starting", "logger": "marchwarden.mcp", "level": "info", "timestamp": "2026-04-09T01:54:03.159945Z"}
{"event": "Processing request of type CallToolRequest", "logger": "mcp.server.lowlevel.server", "level": "info", "timestamp": "2026-04-09T01:54:03.167971Z"}
{"question": "Compare the energy density of lithium-ion vs sodium-ion batteries.", "depth": "balanced", "max_iterations": 5, "token_budget": 20000, "model_id": "claude-sonnet-4-6", "event": "research_started", "trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "researcher": "web", "logger": "marchwarden.researcher.web", "level": "info", "timestamp": "2026-04-09T01:54:03.200030Z"}
{"step": 1, "decision": "Beginning research: depth=balanced", "question": "Compare the energy density of lithium-ion vs sodium-ion batteries.", "context": "", "max_iterations": 5, "token_budget": 20000, "event": "start", "trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:54:03.200318Z"}
{"step": 2, "decision": "Starting iteration 1/5", "tokens_so_far": 0, "event": "iteration_start", "trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:54:03.200405Z"}
{"step": 7, "decision": "Starting iteration 2/5", "tokens_so_far": 1114, "event": "iteration_start", "trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:54:14.560598Z"}
{"step": 12, "decision": "Starting iteration 3/5", "tokens_so_far": 7183, "event": "iteration_start", "trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:54:18.314755Z"}
{"step": 19, "decision": "Starting iteration 4/5", "tokens_so_far": 13977, "event": "iteration_start", "trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:54:28.528912Z"}
{"step": 24, "decision": "Token budget reached before iteration 5: 28015/20000", "event": "budget_exhausted", "trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:54:39.027627Z"}
{"step": 25, "decision": "Beginning synthesis of gathered evidence", "evidence_count": 24, "iterations_run": 4, "tokens_used": 28015, "event": "synthesis_start", "trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:54:39.028531Z"}
{"step": 26, "decision": "Parsed synthesis JSON successfully", "duration_ms": 50955, "event": "synthesis_complete", "trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:55:27.614289Z"}
{"step": 41, "decision": "Research complete", "confidence": 0.91, "citation_count": 8, "gap_count": 3, "discovery_count": 3, "total_duration_sec": 87.865, "event": "complete", "trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "researcher": "web", "logger": "marchwarden.researcher.trace", "level": "info", "timestamp": "2026-04-09T01:55:27.616834Z"}
{"confidence": 0.91, "citations": 8, "gaps": 3, "discovery_events": 3, "tokens_used": 48087, "iterations_run": 4, "wall_time_sec": 84.41376757621765, "budget_exhausted": true, "event": "research_completed", "trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "researcher": "web", "logger": "marchwarden.researcher.web", "level": "info", "timestamp": "2026-04-09T01:55:27.617014Z"}
{"error": "[Errno 13] Permission denied: '/home/micro/.marchwarden/costs.jsonl'", "event": "cost_ledger_write_failed", "trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "researcher": "web", "logger": "marchwarden.researcher.web", "level": "warning", "timestamp": "2026-04-09T01:55:27.617866Z"}
{"event": "Processing request of type ListToolsRequest", "logger": "mcp.server.lowlevel.server", "level": "info", "timestamp": "2026-04-09T01:55:27.632124Z"}
{"trace_id": "aaf3b9ef-d91a-4d03-8883-b0a906929cb1", "confidence": 0.91, "citations": 8, "tokens_used": 48087, "wall_time_sec": 84.41376757621765, "event": "ask_completed", "logger": "marchwarden.cli", "level": "info", "timestamp": "2026-04-09T01:55:27.873634Z"}
╭─────────────────────────────────── Answer ───────────────────────────────────╮
│ Lithium-ion batteries have significantly higher energy density than │
│ sodium-ion batteries across all commercial chemistries. Lithium-ion cells │
│ achieve 150300 Wh/kg gravimetrically, depending on chemistry: NMC variants │
│ reach 250300 Wh/kg in premium automotive applications, while LFP cells │
│ deliver 150180 Wh/kg [Source 15]. Volumetrically, lithium-ion batteries │
│ reach roughly 250700 Wh/L [Source 16]. Sodium-ion batteries currently │
│ achieve 90190 Wh/kg gravimetrically; CATL's first-generation commercial │
│ cells reached ~160 Wh/kg [Source 15], with newer products like CATL's Naxtra │
│ reaching ~175 Wh/kg [Source 22], and ScienceDirect prototypes ranging 90150 │
│ Wh/kg [Source 7]. The volumetric energy density of sodium-ion is │
│ approximately 2040% lower than lithium-ion equivalents [Source 8]. This gap │
│ exists fundamentally because sodium ions are heavier and larger than lithium │
│ ions, reducing the energy stored per unit mass or volume [Source 3, Source │
│ 20]. A notable exception is a late-2025 announcement by ZN Energy of an │
│ anode-free solid-state sodium-ion pouch cell achieving 348.5 Wh/kg, verified │
│ by CATARC, using a high-energy layered oxide cathode and anode-free │
│ solid-state architecture—though this is a laboratory/prototype result, not │
│ yet commercial [Source 10]. In practical terms, sodium-ion batteries are │
│ best suited for stationary storage and cost-sensitive low-performance EVs │
│ where energy density is less critical, while lithium-ion dominates portable │
│ electronics, robotics, and long-range EVs [Source 1, Source 8]. │
╰──────────────────────────────────────────────────────────────────────────────╯
Citations
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ # ┃ Title / Locator ┃ Excerpt ┃ Conf ┃
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ 1 │ Battery Energy Density 2025: │ Nickel Manganese Cobalt (NMC) │ 0.95 │
│ │ State of the Art & Next-Gen │ variants deliver the highest │ │
│ │ Tech │ energy densities at the cell │ │
│ │ https://timharper.net/fieldno │ level, reaching 250-300 Wh/kg │ │
│ │ tes/battery-energy-density-20 │ in premium automotive │ │
│ │ 25/ │ applications... Sodium-ion │ │
│ │ │ batteries have emerged from │ │
│ │ │ laboratory curiosity to │ │
│ │ │ commercial reality, with │ │
│ │ │ CATL's first-generation cells │ │
│ │ │ achieving 160 Wh/kg energy │ │
│ │ │ density. │ │
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
│ 2 │ Sodium ion batteries: A │ Current prototypes of SIBs │ 0.95 │
│ │ sustainable alternative to │ have energy densities of │ │
│ │ lithium-ion ... │ 90150 Wh/kg, which remain │ │
│ │ https://www.sciencedirect.com │ lower than the 130285 Wh/kg │ │
│ │ /science/article/pii/S2949821 │ typically achieved │ │
│ │ X25002418 │ │ │
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
│ 3 │ Sodium-ion batteries: Should │ Sodium is heavier than │ 0.97 │
│ │ we believe the hype? │ lithium, and its ions are │ │
│ │ https://cen.acs.org/energy/en │ larger, resulting in a │ │
│ │ ergy-storage-/Sodium-ion-batt │ volumetric energy density that │ │
│ │ eries-Should-believe/103/web/ │ is 2040% less than that of │ │
│ │ 2025/11 │ lithium ion. Consequently, a │ │
│ │ │ sodium-ion battery is bigger │ │
│ │ │ and heavier than an equivalent │ │
│ │ │ one made with lithium. │ │
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
│ 4 │ Energy Density of Lithium-Ion │ Modern lithium-ion batteries │ 0.90 │
│ │ Batteries Explained: Wh/kg vs │ achieve 150-300 Wh/kg and │ │
│ │ Wh/L │ 250-700 Wh/L, depending on │ │
│ │ https://www.longsingtech.com/ │ chemistry and design. │ │
│ │ energy-density-of-lithium-ion │ │ │
│ │ -batteries/ │ │ │
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
│ 5 │ Sodium Ion vs Lithium Ion │ Energy Density (Gravimetric): │ 0.88 │
│ │ Batteries: 2026 Comparison & │ Sodium-ion typically ranges │ │
│ │ Key Advantages │ from 100175 Wh/kg (e.g., │ │
│ │ https://chargeprotexas.com/so │ CATL's Naxtra at ~175 Wh/kg). │ │
│ │ dium-ion-vs-lithium-ion-batte │ Lithium-ion hits 150250+ │ │
│ │ ries-2026-comparison/ │ Wh/kg (LFP: 150210; NMC: │ │
│ │ │ 240350). │ │
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
│ 6 │ ZN Energy Breaks Sodium-Ion │ Its >25Ah large-format AFSSSIB │ 0.78 │
│ │ Battery Density Record at │ pouch cell achieved a │ │
│ │ 348.5Wh/kg │ gravimetric energy density of │ │
│ │ https://www.linkedin.com/post │ 348.5Wh/kg, verified by CATARC │ │
│ │ s/jerry-wan-069b41105_breakin │ (China Automotive Technology & │ │
│ │ g-the-sodium-ceiling-zhaona-e │ Research Center, Tianjin). │ │
│ │ nergy-activity-74134108276403 │ This is not an incremental │ │
│ │ 20000-NHd_ │ improvement—it directly │ │
│ │ │ challenges the long-held │ │
│ │ │ assumption that sodium │ │
│ │ │ chemistry is structurally │ │
│ │ │ capped at 'low energy │ │
│ │ │ density.' │ │
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
│ 7 │ Sodium as a Green Substitute │ But there are also downsides │ 0.93 │
│ │ for Lithium in Batteries │ to sodium-ion batteries, the │ │
│ │ https://physics.aps.org/artic │ top one being a lower energy │ │
│ │ les/v17/73 │ density than their lithium-ion │ │
│ │ │ counterparts. Energy density │ │
│ │ │ has a direct bearing on the │ │
│ │ │ driving range of an electric │ │
│ │ │ vehicle. │ │
├─────┼───────────────────────────────┼────────────────────────────────┼───────┤
│ 8 │ Sodium-Ion vs Lithium-Ion │ lithium-ion batteries dominate │ 0.85 │
│ │ Batteries Differences and │ high-performance applications │ │
│ │ Applications in 2025 │ like consumer electronics and │ │
│ │ https://www.large-battery.com │ robotics, owing to their │ │
│ │ /blog/na-ion-vs-li-ion-batter │ superior energy density of │ │
│ │ ies-2025/ │ 100270 Wh/kg. │ │
└─────┴───────────────────────────────┴────────────────────────────────┴───────┘
Gaps
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Category ┃ Topic ┃ Detail ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ source_not_found │ Volumetric energy │ Most sources provide │
│ │ density figures for │ gravimetric (Wh/kg) data │
│ │ sodium-ion batteries │ for sodium-ion; specific │
│ │ │ Wh/L volumetric figures │
│ │ │ for sodium-ion cells at │
│ │ │ the commercial pack level │
│ │ │ were not found in │
│ │ │ evidence. │
├───────────────────────┼──────────────────────────┼───────────────────────────┤
│ contradictory_sources │ Independent verification │ The 348.5 Wh/kg result │
│ │ of ZN Energy 348.5 Wh/kg │ for sodium-ion is from a │
│ │ claim │ LinkedIn post summarizing │
│ │ │ a company announcement. │
│ │ │ No peer-reviewed or │
│ │ │ independent third-party │
│ │ │ publication was found to │
│ │ │ corroborate this figure. │
├───────────────────────┼──────────────────────────┼───────────────────────────┤
│ scope_exceeded │ Cycle life vs energy │ While cycle life is │
│ │ density trade-offs in │ mentioned in some │
│ │ sodium-ion │ sources, a detailed │
│ │ │ quantitative comparison │
│ │ │ of how energy density │
│ │ │ degrades over cycle life │
│ │ │ compared to lithium-ion │
│ │ │ was not covered in the │
│ │ │ evidence. │
└───────────────────────┴──────────────────────────┴───────────────────────────┘
Discovery Events
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ ┃ Suggested ┃ ┃ ┃
┃ Type ┃ Researcher ┃ Query ┃ Reason ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ new_source │ arxiv │ anode-free │ ZN Energy's 348.5 │
│ │ │ solid-state │ Wh/kg claim would │
│ │ │ sodium-ion │ benefit from │
│ │ │ battery energy │ peer-reviewed │
│ │ │ density 2025 │ validation on │
│ │ │ │ arXiv or similar │
│ │ │ │ preprint server. │
├──────────────────┼───────────────────┼───────────────────┼───────────────────┤
│ related_research │ database │ sodium-ion │ Volumetric energy │
│ │ │ battery │ density for │
│ │ │ volumetric energy │ sodium-ion at the │
│ │ │ density Wh/L │ cell and pack │
│ │ │ commercial cells │ level is │
│ │ │ 2025 │ underrepresented │
│ │ │ │ in current │
│ │ │ │ evidence. │
├──────────────────┼───────────────────┼───────────────────┼───────────────────┤
│ related_research │ arxiv │ layered oxide │ Multiple sources │
│ │ │ cathode │ mention cathode │
│ │ │ sodium-ion │ engineering as │
│ │ │ specific capacity │ the key │
│ │ │ cycle stability │ bottleneck for │
│ │ │ 2025 │ sodium-ion energy │
│ │ │ │ density │
│ │ │ │ improvement. │
└──────────────────┴───────────────────┴───────────────────┴───────────────────┘
Open Questions
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Priority ┃ Question ┃ Context ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ high │ Will sodium-ion batteries ever │ ZN Energy's prototype achieved │
│ │ match or exceed LFP lithium-ion │ 348.5 Wh/kg, but commercial │
│ │ in gravimetric energy density │ CATL sodium-ion cells are at │
│ │ at the commercial pack level? │ ~160175 Wh/kg while LFP cells │
│ │ │ are 150180 Wh/kg. The gap is │
│ │ │ closing in prototypes but not │
│ │ │ yet in commercial products. │
├──────────┼─────────────────────────────────┼─────────────────────────────────┤
│ medium │ How does energy density change │ Sources mention sodium-ion's │
│ │ over the cycle life of │ lower risk of thermal runaway │
│ │ sodium-ion vs lithium-ion │ and good low-temperature │
│ │ batteries under real-world │ performance, but long-term │
│ │ conditions? │ energy density retention data │
│ │ │ was not found. │
├──────────┼─────────────────────────────────┼─────────────────────────────────┤
│ medium │ What is the volumetric energy │ C&EN states volumetric density │
│ │ density (Wh/L) of current │ is 2040% lower than │
│ │ commercial sodium-ion battery │ lithium-ion but provides no │
│ │ packs? │ absolute Wh/L figures for │
│ │ │ sodium-ion. │
└──────────┴─────────────────────────────────┴─────────────────────────────────┘
╭───────────────────────────────── Confidence ─────────────────────────────────╮
│ Overall: 0.91 │
│ Corroborating sources: 8 │
│ Source authority: high │
│ Contradiction detected: False │
│ Query specificity match: 0.97 │
│ Budget status: spent │
│ Recency: current │
╰──────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────── Cost ────────────────────────────────────╮
│ Tokens: 48087 │
│ Iterations: 4 │
│ Wall time: 84.41s │
│ Model: claude-sonnet-4-6 │
╰──────────────────────────────────────────────────────────────────────────────╯
trace_id: aaf3b9ef-d91a-4d03-8883-b0a906929cb1