#20023: Fix surrogate pair splitting in channel metadata truncation
size: S
trusted-contributor
Cluster:
Surrogate Pair Handling Fixes
`truncateText` in `channel-metadata.ts` uses `.slice()` which can cut between a UTF-16 high and low surrogate, producing invalid strings. This affects channel metadata for groups/topics with emoji in their names.
Check for a high surrogate at the cut boundary and step back one position. Also adds unit tests for `buildUntrustedChannelMetadata`.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Fixes a bug where `truncateText` in `channel-metadata.ts` could split UTF-16 surrogate pairs when truncating channel metadata strings (e.g., emoji in group/topic names). The fix checks for a high surrogate at the cut boundary and steps back one position to keep the pair intact. Also adds a comprehensive test suite for `buildUntrustedChannelMetadata`.
- **Bug fix**: Prevents invalid strings from being produced when channel metadata with emoji is truncated at a surrogate pair boundary.
- **Tests**: New `channel-metadata.test.ts` covers empty entries, null/undefined handling, deduplication, whitespace normalization, surrogate pair safety, and custom `maxChars`.
- **Style note**: The codebase already has `sliceUtf16Safe`/`truncateUtf16Safe` in `src/utils.ts` that handle surrogate-pair-safe slicing. The inline fix duplicates that logic — consider reusing the existing utility for consistency.
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge — the fix is correct and well-tested, with only a minor style suggestion about reusing existing utilities.
- The surrogate pair fix is logically sound and addresses a real bug. New tests provide good coverage. The only note is a style concern about duplicating existing utility logic from `src/utils.ts`.
- No files require special attention.
<sub>Last reviewed commit: 13733db</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#16894: Fix text truncation splitting surrogate pairs in web-fetch, subagen...
by Clawborn · 2026-02-15
85.8%
#19726: Fix HTML entity decoding for astral code points and surrogate-safe ...
by Clawborn · 2026-02-18
78.4%
#18230: fix(sessions): repair lone surrogates in session history before API...
by BinHPdev · 2026-02-16
73.5%
#20301: Security: scrub untrusted metadata from user-facing replies
by ashishc2503 · 2026-02-18
72.6%
#12325: fix: trim leading/trailing whitespace from outbound messages
by jordanstern · 2026-02-09
72.2%
#19675: fix(security): prevent zero-width Unicode chars from bypassing boun...
by williamzujkowski · 2026-02-18
71.6%
#7454: fix: skip UTF-16 heuristic for audio/video/image MIME types (#7444)
by gavinbmoore · 2026-02-02
71.6%
#12064: fix: prevent chunker from truncating messages that fit within limit
by joetomasone · 2026-02-08
70.7%
#13881: fix: Address Greptile feedback - test isolation and channel resolution
by trevorgordon981 · 2026-02-11
70.6%
#23271: fix(chat): strip untrusted metadata blocks from Control UI messages
by lbo728 · 2026-02-22
70.2%