Bug: synthesis output parsing fails on real research runs #16

New issue

Closed

opened 2026-04-08 15:12:42 -06:00 by claude-code · 0 comments

claude-code commented

2026-04-08 15:12:42 -06:00

Collaborator

Discovered during M2.3 smoke test (trace 1a8711c4-a65b-49fd-853e-50fde79c755f).

The agent successfully searched, fetched 28 sources, and produced a real synthesized answer (visible in the trace at step 23, including markdown sections like "Cool-Season Vegetables", source citations, etc.). But the final ResearchResult contains:

answer: "Research on '...' completed but synthesis failed. 28 sources were gathered."
citations: empty
gaps: [{category: budget_exhausted, topic: synthesis, detail: "The synthesis step failed to produce structured output."}]
confidence: 0.10

So the synthesis LLM call works, but parsing its output into the Pydantic contract fails and the agent falls back to an error stub. The trace step labeled "complete" then records confidence: 0.1, citation_count: 0, gap_count: 1, confirming the dropped data.

Repro

MARCHWARDEN_MODEL=claude-sonnet-4-6 \
  scripts/docker-test.sh ask "What are ideal crops for a garden in Utah?"

Likely cause

Schema mismatch between what the synthesis prompt asks the model to emit and what ResearchResult.model_validate* accepts. Worth inspecting WebResearcher._synthesize (or equivalent) and the synthesis trace step to compare expected vs. actual JSON.

Discovered during M2.3 smoke test (trace `1a8711c4-a65b-49fd-853e-50fde79c755f`). The agent successfully searched, fetched 28 sources, and produced a real synthesized answer (visible in the trace at step 23, including markdown sections like "Cool-Season Vegetables", source citations, etc.). But the final ResearchResult contains: - `answer`: "Research on '...' completed but synthesis failed. 28 sources were gathered." - `citations`: empty - `gaps`: `[{category: budget_exhausted, topic: synthesis, detail: "The synthesis step failed to produce structured output."}]` - `confidence`: 0.10 So the synthesis LLM call works, but parsing its output into the Pydantic contract fails and the agent falls back to an error stub. The trace step labeled "complete" then records `confidence: 0.1, citation_count: 0, gap_count: 1`, confirming the dropped data. ## Repro ``` MARCHWARDEN_MODEL=claude-sonnet-4-6 \ scripts/docker-test.sh ask "What are ideal crops for a garden in Utah?" ``` ## Likely cause Schema mismatch between what the synthesis prompt asks the model to emit and what `ResearchResult.model_validate*` accepts. Worth inspecting `WebResearcher._synthesize` (or equivalent) and the synthesis trace step to compare expected vs. actual JSON.