#16894: Fix text truncation splitting surrogate pairs in web-fetch, subagents, and channel metadata
agents
stale
size: S
trusted-contributor
Cluster:
Surrogate Pair Handling Fixes
Several `truncateText` helpers use raw `String.slice()` which can split a surrogate pair (emoji like 🎉, CJK extension B+) and produce lone surrogates in the output. The cron tool and cron normalizer already use `truncateUtf16Safe` for this; this PR aligns the remaining call sites.
**Affected files:**
- `src/agents/tools/web-fetch-utils.ts` — `truncateText` used by `web_fetch` tool output
- `src/agents/tools/subagents-tool.ts` — `truncate` used for subagent result summaries
- `src/security/channel-metadata.ts` — `truncateText` used for untrusted channel metadata
**Fix:** Replace `value.slice(0, n)` with `truncateUtf16Safe(value, n)` from `src/utils.ts`, which already handles surrogate boundary detection.
**Tests:** 5 new test cases across two files verifying emoji and CJK text are not corrupted at truncation boundaries.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Replaces raw `String.slice()` truncation with the existing `truncateUtf16Safe` utility in three call sites (`web-fetch-utils.ts`, `subagents-tool.ts`, `channel-metadata.ts`) to prevent splitting UTF-16 surrogate pairs (emoji, CJK extension B+) during text truncation. This aligns these helpers with the cron tool and cron normalizer, which already use the safe variant.
- **`web-fetch-utils.ts`**: `truncateText` now uses `truncateUtf16Safe` instead of `value.slice(0, maxChars)`
- **`subagents-tool.ts`**: `truncate` helper now uses `truncateUtf16Safe` instead of `text.slice(0, maxLength)`
- **`channel-metadata.ts`**: `truncateText` for untrusted metadata now uses `truncateUtf16Safe` instead of `value.slice(0, ...)`
- Two new test files with 5 test cases verify that emoji and mixed CJK/emoji text are not corrupted at truncation boundaries
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge — it makes minimal, well-scoped changes that replace unsafe string slicing with an already-proven utility function.
- All three changes are mechanical substitutions of `String.slice()` with the existing `truncateUtf16Safe` utility that is already used elsewhere in the codebase. The utility's behavior is well-defined and tested. The new test files provide adequate coverage of the surrogate pair safety invariant. No behavioral regressions are introduced — the only difference is that truncation now backs off by one character when it would otherwise split a surrogate pair.
- No files require special attention
<sub>Last reviewed commit: b014183</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#20023: Fix surrogate pair splitting in channel metadata truncation
by Clawborn · 2026-02-18
85.8%
#19726: Fix HTML entity decoding for astral code points and surrogate-safe ...
by Clawborn · 2026-02-18
83.0%
#12325: fix: trim leading/trailing whitespace from outbound messages
by jordanstern · 2026-02-09
77.2%
#16015: fix(gateway): truncate oversized message content in chat.history re...
by fagemx · 2026-02-14
75.4%
#19675: fix(security): prevent zero-width Unicode chars from bypassing boun...
by williamzujkowski · 2026-02-18
74.2%
#23803: Fix tool metadata truncation
by kamal-ayman · 2026-02-22
74.0%
#17686: fix(memory): support non-ASCII characters in FTS query tokenization
by Phineas1500 · 2026-02-16
73.9%
#16096: fix(i18n): use Unicode-aware word boundaries for non-ASCII language...
by PeterRosdahl · 2026-02-14
73.4%
#20076: feat(tool-truncation): use head+tail strategy to preserve errors du...
by jlwestsr · 2026-02-18
73.4%
#3921: fix: sanitize fetch headers to prevent ByteString crash on Unicode ...
by nexiouscaliver · 2026-01-29
73.2%