#14328: fix: strip incomplete tool_use blocks from errored/aborted messages to prevent permanent 400 loops
agents
size: S
Cluster:
Error Handling in Agent Tools
#### Summary
Fixes a critical session-poisoning bug where an interrupted streaming response permanently breaks the session with a 400 error loop. When a tool call is interrupted mid-stream (network error, timeout, user abort), the assistant message contains incomplete `tool_use` blocks (with `partialJson: true`). The previous fix (#4597) correctly avoided creating synthetic `tool_result` entries for these, but still left the malformed `tool_use` blocks in the transcript. On every subsequent API call, Anthropic rejects the request:
```
400 messages.244.content.1: unexpected tool_use_id found in tool_result blocks:
toolu_01PfLjsziXFMs7pAQCtBLn1f
```
The error is baked into the session history, so every message hits the same 400. Only `/new` or `/reset` recovers.
lobster-biscuit
#### Repro Steps
1. Start a long session with many tool calls
2. Have a tool call interrupted mid-stream (network issue, timeout, abort)
3. OpenClaw persists the assistant message with `stopReason: "error"` and incomplete `tool_use` blocks
4. Send any new message -> permanent 400 loop
#### Root Cause
Three interconnected issues:
1. **`repairToolUseResultPairing()` passed errored messages through unchanged** (line 224-227 before this fix) — the incomplete `tool_use` blocks remained in the transcript with no matching `tool_result`, causing the 400
2. **`hasToolCallInput()` didn't detect `partialJson: true`** — blocks flagged as partial (interrupted mid-stream) with an empty `input: {}` passed the completeness check
3. **No user-friendly error** for the specific 400 pattern "unexpected tool_use_id found in tool_result blocks"
#### Behavior Changes
- **Errored/aborted assistant messages**: `tool_use`/`toolCall`/`functionCall` blocks are now stripped from the content array. Text and thinking blocks are preserved (partial reasoning is still valuable). If no content remains after stripping, the entire message is dropped.
- **`partialJson: true` tool calls**: Now detected as incomplete by `hasToolCallInput()` and dropped during the `sanitizeToolCallInputs` pass, even when `input` is present.
- **User-facing error**: If the 400 still reaches the user (e.g. from an older session file), `formatAssistantErrorText()` now returns a clear message instead of raw JSON.
#### Codebase and GitHub Search
- Searched for `partialJson`, `unexpected tool_use_id`, `stopReason.*error`, `tool_result blocks` across the codebase
- Found the existing partial fix from #4597 and understood why it was insufficient
- Verified `sanitizeToolCallInputs` runs before `repairToolUseResultPairing` in the sanitization pipeline (`google.ts:352-354`)
- Confirmed compaction already calls `repairToolUseResultPairing` after dropping chunks (`compaction.ts:343`)
#### Tests
All existing tests updated + 3 new tests added:
- `strips tool_use blocks from errored assistant messages to prevent 400 loops` — verifies tool-only errored message is dropped
- `strips tool_use blocks from aborted assistant messages to prevent 400 loops` — same for aborted
- `preserves text content from errored assistant messages while stripping tool_use` — verifies text/thinking blocks survive
- `drops tool calls with partialJson: true even when input is present` — verifies the `hasToolCallInput()` fix
- `full scenario: interrupted stream does not poison session permanently` — end-to-end test reproducing the exact #14322 scenario through both sanitization passes
```
pnpm vitest run src/agents/session-transcript-repair.test.ts # 13/13 pass
pnpm vitest run src/agents/session-tool-result-guard.test.ts # 10/10 pass
pnpm vitest run src/agents/pi-embedded-runner.sanitize-session-history.test.ts # 9/9 pass
pnpm vitest run src/agents/compaction.test.ts # 10/10 pass
pnpm vitest run src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts # 10/10 pass
pnpm check # format + typecheck + lint all pass
pnpm build # clean build
```
**Sign-Off**
- Models used: Claude Opus 4.6
- Submitter effort: Deep codebase analysis, traced the full session history -> context assembly -> API call pipeline, identified three interconnected root causes, implemented multi-layer defense
- Agent notes: AI-assisted PR. The fix is surgical — 3 files, ~160 lines added, all focused on the bug. No unrelated changes.
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
Fixes a session-poisoning bug where interrupted streaming responses (network errors, timeouts, user aborts) permanently break sessions with 400 error loops. The root cause: incomplete `tool_use` blocks with `partialJson: true` were left in the transcript, causing every subsequent API call to be rejected by Anthropic.
- **`hasToolCallInput()`** now detects `partialJson: true` blocks as incomplete, so `sanitizeToolCallInputs` drops them during the first sanitization pass
- **`repairToolUseResultPairing()`** now strips tool call blocks from errored/aborted assistant messages instead of passing them through unchanged; text and thinking content is preserved
- **`formatAssistantErrorText()`** adds a user-friendly error message for the specific "unexpected tool_use_id found in tool_result blocks" 400 pattern
- Tests updated and expanded with 3 new test cases plus an end-to-end scenario reproducing the exact issue
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge — it fixes a critical session-poisoning bug with well-scoped, defensive changes and thorough test coverage.
- The fix is surgical and well-reasoned: 3 files changed with ~160 lines added, all directly targeting the bug. The multi-layer defense (sanitizeToolCallInputs catches partialJson, repairToolUseResultPairing strips remaining tool blocks from errored messages, formatAssistantErrorText provides a fallback user message) is appropriate. Tests cover the key scenarios including an end-to-end reproduction. The only minor gap is the lack of a direct unit test for isCorruptedToolUsePairingError and its integration into formatAssistantErrorText, though the existing test suite for that function passes.
- No files require special attention. All changes are focused and correct.
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#15050: fix: transcript corruption resilience — strip aborted tool_use bloc...
by yashchitneni · 2026-02-12
88.4%
#4844: fix(agents): skip error/aborted assistant messages in transcript re...
by lailoo · 2026-01-30
87.5%
#8345: fix: prevent synthetic error repair from creating tool_result for d...
by vishaltandale00 · 2026-02-03
87.2%
#9416: fix: drop errored/aborted assistant tool pairs in transcript repair
by xandorklein · 2026-02-05
86.9%
#12487: fix(agents): strip orphaned tool_result when tool_use is sanitized ...
by skylarkoo7 · 2026-02-09
86.8%
#6687: fix(session-repair): strip malformed tool_use blocks to prevent per...
by NSEvent · 2026-02-01
86.8%
#8270: fix: support snake_case 'tool_use' in transcript repair (#8264)
by heliosarchitect · 2026-02-03
86.1%
#11825: fix: keep tool_use/tool_result pairs together during session compac...
by C31gordon · 2026-02-08
86.0%
#3647: fix: sanitize tool arguments in session history
by nhangen · 2026-01-29
85.9%
#21195: fix: suppress orphaned tool_use/tool_result errors after session co...
by ruslansychov-git · 2026-02-19
85.8%