#6687: fix(session-repair): strip malformed tool_use blocks to prevent permanent session corruption
agents
Cluster:
Error Handling in Agent Tools
## Summary
- Strips malformed `tool_use`/`toolCall`/`functionCall` blocks from assistant messages BEFORE the existing pairing repair runs
- Adds `droppedMalformedToolUseCount` to the repair report for observability
- Prevents creating synthetic error results for blocks that were never valid tool calls
## Problem
When tool calls are interrupted (by error, timeout, content filtering, or process termination), sessions become **permanently corrupted**. Every subsequent API request fails with:
- `unexpected tool_use_id found in tool_result blocks`
- `tool result's tool id not found (2013)`
**Root cause:** The existing `extractToolCallsFromAssistant()` skips malformed blocks (missing id) but **leaves them in the message content**. The blocks remain in the transcript, causing API rejections.
## Solution
Add a pre-processing step that strips malformed tool_use blocks before the pairing repair runs:
**Malformed conditions detected:**
- Missing or empty `id` field (tool call wasn't fully initialized)
- Has `partialJson` field present (Anthropic SDK streaming artifact) - uses property presence check (`"partialJson" in rec`) to catch regardless of value
- Has `partial` field set to `true` (generic streaming indicator)
- Has `incomplete` field set to `true` (OpenAI-style indicator)
**Type variants supported (via shared `TOOL_BLOCK_TYPES` Set):**
- camelCase: `toolCall`, `toolUse`, `functionCall`
- snake_case: `tool_use`, `function_call`
Both `isValidToolUseBlock` and `extractToolCallsFromAssistant` use the same `TOOL_BLOCK_TYPES` Set to ensure consistent handling across validation and extraction.
The `name` field is intentionally NOT required - `extractToolCallsFromAssistant` already handles missing names gracefully by defaulting to `undefined`.
### Design decisions
- **Shared constant:** `TOOL_BLOCK_TYPES` is a Set used by both functions to ensure consistency.
- **Property presence vs value check:** For `partialJson`, we use `"partialJson" in rec` rather than `!== undefined` because the mere presence of this field (even if explicitly `undefined`) indicates a streaming artifact.
- **Strict boolean checks for partial/incomplete:** We use `=== true` rather than truthy checks to avoid false positives from falsy values like `0`, `""`, or `null` which don't indicate a partial tool call.
- **Expanded logging:** All non-zero repair counters are now logged (malformed stripped, orphans dropped, duplicates dropped, synthetic results added) for easier debugging.
## Test plan
- [x] Added comprehensive tests for malformed block detection
- [x] Existing tests pass (`pnpm test src/agents/session-transcript-repair.test.ts`)
- [x] Full test suite passes (`pnpm test`)
- [x] Lint passes (`pnpm lint`)
Fixes #5497, #5481, #5430, #5518
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR hardens session transcript repair by stripping malformed assistant tool blocks (e.g., missing/empty `id` or streaming artifacts like `partialJson`, `partial: true`, `incomplete: true`) before the existing tool-call/tool-result pairing logic runs. It also unifies tool-block type detection across validation and extraction via a shared `TOOL_BLOCK_TYPES` set (supporting both camelCase and snake_case variants), adds `droppedMalformedToolUseCount` to the repair report for observability, and updates the Google embedded runner to log non-zero repair counters.
The change integrates cleanly with existing transcript sanitation: `sanitizeToolUseResultPairing()` now delegates to `repairToolUseResultPairing()`, which first cleans assistant content and then enforces strict provider requirements by moving matching `toolResult` messages directly after the corresponding assistant tool-call turn, dropping orphan/duplicate results, and synthesizing missing results only for valid tool calls.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk.
- Changes are narrowly scoped to transcript sanitation, include defensive runtime checks, preserve message ordering/metadata, and are covered by targeted unit tests that exercise the new malformed-block stripping and reporting behavior.
- No files require special attention
<!-- greptile_other_comments_section -->
**Context used:**
- Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))
- Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13))
<!-- /greptile_comment -->
Most Similar PRs
#8345: fix: prevent synthetic error repair from creating tool_result for d...
by vishaltandale00 · 2026-02-03
91.6%
#12487: fix(agents): strip orphaned tool_result when tool_use is sanitized ...
by skylarkoo7 · 2026-02-09
88.9%
#8312: fix: add logging and markers for tool result repair
by ekson73 · 2026-02-03
88.5%
#15509: fix(session): drop tool_use blocks with empty or missing name
by aldoeliacim · 2026-02-13
87.5%
#14328: fix: strip incomplete tool_use blocks from errored/aborted messages...
by Kropiunig · 2026-02-12
86.8%
#4844: fix(agents): skip error/aborted assistant messages in transcript re...
by lailoo · 2026-01-30
86.3%
#19094: Fix empty tool_call_id and function names in provider transcript pa...
by yxshee · 2026-02-17
86.2%
#15050: fix: transcript corruption resilience — strip aborted tool_use bloc...
by yashchitneni · 2026-02-12
85.9%
#16966: fix: strip tool_use blocks from aborted/errored assistant messages
by StressTestor · 2026-02-15
85.6%
#9416: fix: drop errored/aborted assistant tool pairs in transcript repair
by xandorklein · 2026-02-05
85.1%