The synthesis step was passing max_tokens=4096 to Claude, which was
not enough for a full ResearchResult JSON over a real evidence set
(28 sources). The model's output got cut mid-string, json.loads
failed, and the agent fell back to a stub answer with zero citations.
The trace logger then truncated the raw_response to 1000 chars before
recording it, hiding the actual reason for the parse failure (the
truncated JSON suffix) and making the bug invisible from traces.
Fixes:
- Bump synthesis max_tokens to 16384
- Capture and log Claude's stop_reason on synthesis_error so future
truncation cases are diagnosable from the trace alone
- Log the parser exception text alongside the raw_response
- Stop slicing raw_response — record the full string
Verified end-to-end against the Utah crops question:
- Before: 0 citations, confidence 0.10, fallback stub
- After: 9 citations, confidence 0.88, real synthesized answer
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>