← Back to PRs

#18230: fix(sessions): repair lone surrogates in session history before API call

by BinHPdev open 2026-02-16 16:44 View on GitHub →
channel: mattermost agents size: M
## Summary - Add `repairLoneSurrogates()` to the `sanitizeSessionHistory` pipeline that replaces unpaired UTF-16 surrogates with U+FFFD (replacement character) before sending to the LLM API. - Lone surrogates are produced when streaming delta assembly splits supplementary plane emoji (U+10000+) across chunk boundaries, corrupting the JSONL transcript. - Without this fix, the Anthropic API rejects the replayed history with "no low surrogate in string", permanently breaking the session. - Uses a deep recursive walk to repair surrogates in all string values including tool_use `input` objects, text content, etc. Closes #18105 ## Test plan - [x] New `session-transcript-repair.surrogates.test.ts` with 6 tests: - Lone high surrogate replaced with U+FFFD - Lone low surrogate replaced with U+FFFD - Valid surrogate pairs (emoji) preserved - Tool_use input object strings repaired - No-op when no surrogates present (same reference returned) - Deeply nested objects handled - [x] Existing session transcript repair tests unaffected 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- greptile_comment --> <h3>Greptile Summary</h3> Added `repairLoneSurrogates()` to the session history sanitization pipeline, preventing API rejections from lone UTF-16 surrogates that occur when streaming delta assembly splits supplementary plane emoji across chunk boundaries. The implementation uses a regex pattern to detect unpaired high/low surrogates and replaces them with U+FFFD (replacement character) while preserving valid surrogate pairs. This fix prevents permanent session corruption that would occur when the Anthropic API rejects replayed history with "no low surrogate in string" errors. **Key changes:** - Core repair logic in `session-transcript-repair.ts:149-196` using regex pattern and deep recursive object traversal - Integration into `sanitizeSessionHistory` pipeline in `google.ts:443` between thinking block sanitization and tool call input repair - Comprehensive test coverage with 6 test cases covering lone high/low surrogates, valid pairs, tool_use objects, no-op cases, and deep nesting - CHANGELOG has a duplicate entry (line 13-14) <h3>Confidence Score: 4/5</h3> - Safe to merge after fixing the duplicate CHANGELOG entry - The implementation correctly addresses the lone surrogate issue with a well-tested solution. The regex pattern correctly identifies unpaired surrogates using negative lookahead/lookbehind, the deep traversal preserves object identity when no changes are needed (important for performance), and test coverage is thorough. The only issue is a minor duplicate CHANGELOG entry that should be removed before merging. - CHANGELOG.md requires fixing the duplicate entry on lines 13-14 <sub>Last reviewed commit: 4d95e82</sub> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs