← Back to PRs

#21560: runner: sanitize invalid UTF-16 surrogates in session/prompt payloads

by VontaJamal open 2026-02-20 03:09 View on GitHub →
agents size: M
## Summary - Problem: invalid UTF-16 surrogates in session/prompt text can produce provider JSON errors like `no low surrogate`. - Impact: a poisoned turn can repeat failures on later sends. - Fix: sanitize invalid surrogate code units before request serialization, and classify surrogate-related malformed-JSON errors as `format`. - Scope boundary: no usage/quota preflight behavior in this PR. - AI-assisted disclosure: AI-assisted implementation, then manual review and manual test verification. ## Quick Review (2-3 min) 1. Check `src/agents/pi-embedded-runner/unicode-safety.ts` for surrogate repair behavior. 2. Check `src/agents/pi-embedded-runner/run/attempt.ts` and `src/agents/pi-embedded-runner/google.ts` call sites for prompt/history sanitization. 3. Check tests listed below for lone high/low surrogate handling and valid pair preservation. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [x] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes # - Related # ## User-visible / Behavior Changes - Invalid lone surrogates are repaired before request serialization. - Surrogate-related malformed JSON provider errors are classified as `format`. - Valid surrogate pairs (normal emoji text) remain unchanged. ## Security Impact (required) - New permissions/capabilities? (`Yes/No`) No - Secrets/tokens handling changed? (`Yes/No`) No - New/changed network calls? (`Yes/No`) No - Command/tool execution surface changed? (`Yes/No`) No - Data access scope changed? (`Yes/No`) No - If any `Yes`, explain risk + mitigation: ## Repro + Verification ### Environment - OS: macOS (Apple Silicon) - Runtime/container: Node 22 / pnpm 10 - Integration/channel: N/A (runner-level tests) ### Steps 1. Feed text with lone high and lone low surrogate units through session/prompt sanitization paths. 2. Verify repaired output excludes unpaired surrogates. 3. Verify valid surrogate pairs stay unchanged. 4. Verify surrogate-malformed JSON errors classify as `format`. ### Expected - Invalid surrogates are repaired, valid pairs remain intact, and classification is `format`. ### Actual - Matches expected. ## Evidence - [x] Failing test/log before + passing after - [x] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) ## Human Verification (required) - `corepack pnpm vitest run src/agents/pi-embedded-runner/unicode-safety.test.ts src/agents/pi-embedded-runner.sanitize-session-history.test.ts` - `corepack pnpm vitest run --config vitest.e2e.config.ts src/agents/pi-embedded-helpers.isbillingerrormessage.e2e.test.ts` - `corepack pnpm oxlint --type-aware` on touched files - Manual spot-check: valid emoji surrogate pairs unchanged. - Not verified here: full repo `corepack pnpm tsgo` due pre-existing unrelated TS2742 baseline failures. ## Compatibility / Migration - Backward compatible? (`Yes/No`) Yes - Config/env changes? (`Yes/No`) No - Migration needed? (`Yes/No`) No - If yes, exact upgrade steps: ## Failure Recovery (if this breaks) - How to disable/revert quickly: revert commit `9cba5443793b07b0a18b2123ddc3948b982d0bf5`. - Files/config to restore: - `src/agents/pi-embedded-runner/unicode-safety.ts` - `src/agents/pi-embedded-runner/google.ts` - `src/agents/pi-embedded-runner/run/attempt.ts` - `src/agents/pi-embedded-helpers/errors.ts` ## Risks and Mitigations - Risk: over-sanitization could alter intentionally malformed text. - Mitigation: replacement is limited to invalid unpaired surrogate units; valid pairs are preserved and covered by tests.

Most Similar PRs