#11825: fix: keep tool_use/tool_result pairs together during session compaction

by C31gordon open 2026-02-08 11:29 View on GitHub →

agents stale

## Problem When sessions get summarized to save tokens, the chunking functions (`splitMessagesByTokenShare`, `chunkMessagesByMaxTokens`) split messages purely by token count without considering Anthropic's API requirement that `tool_use` blocks must be immediately followed by matching `tool_result` messages. This causes the error: ``` unexpected tool_use_id found in tool_result blocks ``` ### Root Cause 1. `splitMessagesByTokenShare()` splits at token boundaries 2. `chunkMessagesByMaxTokens()` splits at token boundaries When a split happens between an assistant message (containing `tool_use`) and its corresponding `tool_result`: - The `tool_use` gets summarized away - The `tool_result` remains with an ID referencing a non-existent tool call - Anthropic API rejects the malformed transcript ## Solution Added helper functions and modified the chunking logic: ### New Helper Functions ```typescript hasPendingToolCalls(message) // Detects assistant messages with tool calls isToolResultMessage(message) // Detects tool_result messages ``` ### Modified Functions **`splitMessagesByTokenShare`** - Now checks before splitting: - Won't split if previous message has pending tool calls - Won't split if current message is a tool_result **`chunkMessagesByMaxTokens`** - Same guards as above ## Testing Added comprehensive tests: - ✅ Verifies tool_use and tool_result stay in the same chunk - ✅ Verifies chunks never start with tool_result - ✅ Tests both `splitMessagesByTokenShare` and `chunkMessagesByMaxTokens` ## Backwards Compatibility - No API changes - No configuration changes - Existing sessions work correctly - Chunks may be slightly larger in edge cases (tool pairs kept together) The existing `repairToolUseResultPairing` in `pruneHistoryForContextShare` provides defense-in-depth, but preventing bad splits is more robust.  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> Updates the session compaction chunking logic to avoid splitting between `tool_use` and the corresponding `tool_result` message, which can create malformed transcripts rejected by the Anthropic API. This is done by adding helpers to detect assistant tool-call blocks and tool-result messages, then guarding split points in both `splitMessagesByTokenShare` and `chunkMessagesByMaxTokens`. Adds unit tests intended to assert that chunks don’t begin with `toolResult` and that tool-use/result pairs remain co-located. <h3>Confidence Score: 4/5</h3> - Mostly safe to merge, but one test issue can mask regressions. - The production change is localized and the new split guards are straightforward, but the added tests currently identify the tool-use chunk too broadly (any assistant message), so they can pass even if pairing is broken. Fixing the test predicates would better validate the intended invariant. - src/agents/compaction.test.ts  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> **Context used:** - Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8)) - Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13))