#21462: fix(agents): hold back partial NO_REPLY token in pi-embedded streaming

by algal open 2026-02-20 00:02 View on GitHub →

agents size: XS

Cluster: Memory and Language Support Enhancements

## Summary Holds back emission of streaming text deltas in the pi-embedded WebSocket path when the accumulated text could be a partial `SILENT_REPLY_TOKEN` (`NO_REPLY`). Fixes #21461 ## Problem When the LLM streams `NO_REPLY` split across chunk boundaries (e.g. `NO_` then `REPLY`), `handleMessageUpdate()` emits the partial prefix as a text delta before the full token is recognized. Voice clients with TTS pipelines synthesize and play the partial token aloud. ## Change In `handleMessageUpdate()`, after computing `shouldEmit`, suppress emission when: - Nothing has been emitted yet (`!previousCleaned`) - The accumulated cleaned text is shorter than `SILENT_REPLY_TOKEN` - `SILENT_REPLY_TOKEN` starts with the accumulated text (i.e. it's a proper prefix) Once enough text arrives to confirm or rule out the directive, emission resumes normally. ## Impact - Zero latency for normal responses (check only fires on first chunk when text is shorter than the token) - One file changed, 12 lines added - No changes to the streaming protocol or client-facing API ## Test plan - Verify existing pi-embedded tests pass - Simulate split by feeding `handleMessageUpdate()` with accumulated text `"NO_"` followed by `"NO_REPLY"` — first call should not emit, second should suppress entirely  <h3>Greptile Summary</h3> This PR adds a guard to prevent streaming partial `NO_REPLY` tokens (e.g., `"NO"` or `"NO_"`) to voice clients in pi-embedded WebSocket streams. The fix suppresses emission only on the first chunk when the accumulated text is a proper prefix of `SILENT_REPLY_TOKEN` and shorter than the full token length. - Prevents TTS synthesis of partial tokens like `"NO"` before `"_REPLY"` arrives - Zero latency impact for normal responses (check only fires when `!previousCleaned` and text length < 8) - The guard correctly uses `trimmedCleaned` for prefix matching while preserving the original `cleanedText` for state tracking <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk - The change is well-scoped, addresses a specific streaming bug with a minimal and correct fix. The logic only affects the first chunk when text could be a partial NO_REPLY prefix, ensuring zero impact on normal responses. The implementation correctly checks all three conditions (shouldEmit, !previousCleaned, and prefix match) and uses proper trimming for comparison. - No files require special attention <sub>Last reviewed commit: b63eb0d</sub>  <sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>