Fix synthesis truncation and trace masking #20

Merged
archeious merged 1 commit from fix/synthesis-truncation into main 2026-04-08 21:24:42 +00:00
Collaborator

Closes Issue #16
Closes Issue #19

Root cause

Two intertwined bugs in researchers/web/agent.py:

  1. Synthesis output truncationmax_tokens=4096 on the synthesis Claude call. For a real evidence set (28 sources, full citations + answer + gaps + open questions), the JSON output exceeds that cap and gets cut mid-string. json.loads then fails and the agent falls back to a stub.
  2. Trace maskingraw_response=raw_text[:1000] in the synthesis_error log, which hides the truncated JSON tail (the only thing that would let you diagnose the cause). The bug was effectively invisible from traces alone.

Fix

  • Bump synthesis max_tokens to 16384
  • Capture response.stop_reason and the parser exception text in the trace
  • Stop slicing raw_response — log the full string

Verified

Same Utah crops question as Issue #10:

Before After
Citations 0 9
Confidence 0.10 0.88
Answer "synthesis failed" stub Real multi-paragraph answer with cool/warm-season categories
Trace a5e49df2-4188-4b07-aedc-42a58cb71ef3 clean

88/88 tests still passing.

Out of scope

Token budget enforcement (Issue #17) is now plainly visible — the run consumed 41710 tokens against a 20000 budget. Separate fix.

Closes Issue #16 Closes Issue #19 ## Root cause Two intertwined bugs in `researchers/web/agent.py`: 1. **Synthesis output truncation** — `max_tokens=4096` on the synthesis Claude call. For a real evidence set (28 sources, full citations + answer + gaps + open questions), the JSON output exceeds that cap and gets cut mid-string. `json.loads` then fails and the agent falls back to a stub. 2. **Trace masking** — `raw_response=raw_text[:1000]` in the synthesis_error log, which hides the truncated JSON tail (the only thing that would let you diagnose the cause). The bug was effectively invisible from traces alone. ## Fix - Bump synthesis `max_tokens` to 16384 - Capture `response.stop_reason` and the parser exception text in the trace - Stop slicing `raw_response` — log the full string ## Verified Same Utah crops question as Issue #10: | | Before | After | |---|---|---| | Citations | 0 | 9 | | Confidence | 0.10 | 0.88 | | Answer | "synthesis failed" stub | Real multi-paragraph answer with cool/warm-season categories | | Trace `a5e49df2-4188-4b07-aedc-42a58cb71ef3` | — | clean | 88/88 tests still passing. ## Out of scope Token budget enforcement (Issue #17) is now plainly visible — the run consumed 41710 tokens against a 20000 budget. Separate fix.
claude-code added 1 commit 2026-04-08 21:23:22 +00:00
The synthesis step was passing max_tokens=4096 to Claude, which was
not enough for a full ResearchResult JSON over a real evidence set
(28 sources). The model's output got cut mid-string, json.loads
failed, and the agent fell back to a stub answer with zero citations.

The trace logger then truncated the raw_response to 1000 chars before
recording it, hiding the actual reason for the parse failure (the
truncated JSON suffix) and making the bug invisible from traces.

Fixes:
- Bump synthesis max_tokens to 16384
- Capture and log Claude's stop_reason on synthesis_error so future
  truncation cases are diagnosable from the trace alone
- Log the parser exception text alongside the raw_response
- Stop slicing raw_response — record the full string

Verified end-to-end against the Utah crops question:
- Before: 0 citations, confidence 0.10, fallback stub
- After:  9 citations, confidence 0.88, real synthesized answer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
archeious approved these changes 2026-04-08 21:24:37 +00:00
archeious merged commit c19a161a62 into main 2026-04-08 21:24:42 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: archeious/marchwarden#20
No description provided.