#21462: fix(agents): hold back partial NO_REPLY token in pi-embedded streaming
agents
size: XS
## Summary
Holds back emission of streaming text deltas in the pi-embedded WebSocket path when the accumulated text could be a partial `SILENT_REPLY_TOKEN` (`NO_REPLY`).
Fixes #21461
## Problem
When the LLM streams `NO_REPLY` split across chunk boundaries (e.g. `NO_` then `REPLY`), `handleMessageUpdate()` emits the partial prefix as a text delta before the full token is recognized. Voice clients with TTS pipelines synthesize and play the partial token aloud.
## Change
In `handleMessageUpdate()`, after computing `shouldEmit`, suppress emission when:
- Nothing has been emitted yet (`!previousCleaned`)
- The accumulated cleaned text is shorter than `SILENT_REPLY_TOKEN`
- `SILENT_REPLY_TOKEN` starts with the accumulated text (i.e. it's a proper prefix)
Once enough text arrives to confirm or rule out the directive, emission resumes normally.
## Impact
- Zero latency for normal responses (check only fires on first chunk when text is shorter than the token)
- One file changed, 12 lines added
- No changes to the streaming protocol or client-facing API
## Test plan
- Verify existing pi-embedded tests pass
- Simulate split by feeding `handleMessageUpdate()` with accumulated text `"NO_"` followed by `"NO_REPLY"` — first call should not emit, second should suppress entirely
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR adds a guard to prevent streaming partial `NO_REPLY` tokens (e.g., `"NO"` or `"NO_"`) to voice clients in pi-embedded WebSocket streams. The fix suppresses emission only on the first chunk when the accumulated text is a proper prefix of `SILENT_REPLY_TOKEN` and shorter than the full token length.
- Prevents TTS synthesis of partial tokens like `"NO"` before `"_REPLY"` arrives
- Zero latency impact for normal responses (check only fires when `!previousCleaned` and text length < 8)
- The guard correctly uses `trimmedCleaned` for prefix matching while preserving the original `cleanedText` for state tracking
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk
- The change is well-scoped, addresses a specific streaming bug with a minimal and correct fix. The logic only affects the first chunk when text could be a partial NO_REPLY prefix, ensuring zero impact on normal responses. The implementation correctly checks all three conditions (shouldEmit, !previousCleaned, and prefix match) and uses proper trimming for comparison.
- No files require special attention
<sub>Last reviewed commit: b63eb0d</sub>
<!-- greptile_other_comments_section -->
<sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#23761: fix: suppress partial NO_REPLY tokens at lifecycle boundary
by kami-saia · 2026-02-22
85.0%
#19648: fix: suppress silent-reply partial tokens during streaming
by bradleypriest · 2026-02-18
84.5%
#19673: fix(telegram): avoid starting streaming replies with only 1-2 words
by emanuelst · 2026-02-18
77.9%
#8493: fix(tui): filter NO_REPLY token from chat display
by gavinbmoore · 2026-02-04
77.5%
#4495: Fix: emit final assistant event when reply tags hide stream
by ukeate · 2026-01-30
77.3%
#15118: Fix webchat ghost bubble when model replies with NO_REPLY
by jwchmodx · 2026-02-13
77.3%
#16361: Gateway: suppress NO_REPLY in webchat
by shadril238 · 2026-02-14
76.6%
#19576: fix: tighten isSilentReplyText to match whole-text only
by aldoeliacim · 2026-02-18
76.1%
#16321: Fix #12767: suppress HEARTBEAT_OK leakage in Telegram DM replies
by tdjackey · 2026-02-14
76.0%
#19916: fix: strict silent-reply detection to prevent false positives with ...
by hayoial · 2026-02-18
75.8%