Bug: synthesis output parsing fails on real research runs #16

Closed
opened 2026-04-08 21:12:42 +00:00 by claude-code · 0 comments
Collaborator

Discovered during M2.3 smoke test (trace 1a8711c4-a65b-49fd-853e-50fde79c755f).

The agent successfully searched, fetched 28 sources, and produced a real synthesized answer (visible in the trace at step 23, including markdown sections like "Cool-Season Vegetables", source citations, etc.). But the final ResearchResult contains:

  • answer: "Research on '...' completed but synthesis failed. 28 sources were gathered."
  • citations: empty
  • gaps: [{category: budget_exhausted, topic: synthesis, detail: "The synthesis step failed to produce structured output."}]
  • confidence: 0.10

So the synthesis LLM call works, but parsing its output into the Pydantic contract fails and the agent falls back to an error stub. The trace step labeled "complete" then records confidence: 0.1, citation_count: 0, gap_count: 1, confirming the dropped data.

Repro

MARCHWARDEN_MODEL=claude-sonnet-4-6 \
  scripts/docker-test.sh ask "What are ideal crops for a garden in Utah?"

Likely cause

Schema mismatch between what the synthesis prompt asks the model to emit and what ResearchResult.model_validate* accepts. Worth inspecting WebResearcher._synthesize (or equivalent) and the synthesis trace step to compare expected vs. actual JSON.

Discovered during M2.3 smoke test (trace `1a8711c4-a65b-49fd-853e-50fde79c755f`). The agent successfully searched, fetched 28 sources, and produced a real synthesized answer (visible in the trace at step 23, including markdown sections like "Cool-Season Vegetables", source citations, etc.). But the final ResearchResult contains: - `answer`: "Research on '...' completed but synthesis failed. 28 sources were gathered." - `citations`: empty - `gaps`: `[{category: budget_exhausted, topic: synthesis, detail: "The synthesis step failed to produce structured output."}]` - `confidence`: 0.10 So the synthesis LLM call works, but parsing its output into the Pydantic contract fails and the agent falls back to an error stub. The trace step labeled "complete" then records `confidence: 0.1, citation_count: 0, gap_count: 1`, confirming the dropped data. ## Repro ``` MARCHWARDEN_MODEL=claude-sonnet-4-6 \ scripts/docker-test.sh ask "What are ideal crops for a garden in Utah?" ``` ## Likely cause Schema mismatch between what the synthesis prompt asks the model to emit and what `ResearchResult.model_validate*` accepts. Worth inspecting `WebResearcher._synthesize` (or equivalent) and the synthesis trace step to compare expected vs. actual JSON.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: archeious/marchwarden#16
No description provided.