#19648: fix: suppress silent-reply partial tokens during streaming

by bradleypriest open 2026-02-18 01:56 View on GitHub →

size: S trusted-contributor

Cluster: Memory and Language Support Enhancements

## Summary - **Problem:** During streaming, partial tokens like `NO_R`, `HEARTBEAT` leak to Telegram before the full `NO_REPLY`/`HEARTBEAT_OK` silent-reply check can match. Users briefly see ghost text before it disappears. - **Why it matters:** Breaks the silent-reply contract on streaming channels (Telegram `streamMode: "partial"`, webchat). Same root cause as #15060. - **What changed:** Added `isSilentReplyPrefix()` that checks if accumulated stream text is a prefix of a silent token. Applied in `normalizeStreamingText`, `parseChunk`, and `startTypingOnText` to hold back output until the token completes (suppress) or diverges (flush normally). - **What did NOT change:** Final delivery logic (`isSilentReplyText`) is untouched. Non-streaming paths are unaffected. No config changes. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Related #15060 ## User-visible / Behavior Changes - Silent replies (`NO_REPLY`, `HEARTBEAT_OK`) no longer flash partial tokens on streaming channels before being suppressed. <img width="914" height="226" alt="image" src="https://github.com/user-attachments/assets/30d90251-c829-4e52-bc72-75641da4c37c" /> ## Security Impact (required) - New permissions/capabilities? `No` - Secrets/tokens handling changed? `No` - New/changed network calls? `No` - Command/tool execution surface changed? `No` - Data access scope changed? `No` ## Repro + Verification ### Environment - OS: Linux (Raspberry Pi arm64) - Runtime/container: Node v24 - Model/provider: Any (streaming-capable) - Integration/channel: Telegram with `streamMode: "partial"` - Relevant config: Default streaming config ### Steps 1. Send a message to a Telegram group where the agent is configured to stay silent (e.g. casual banter that triggers `NO_REPLY`) 2. Watch the chat during streaming ### Expected - No visible output — message is fully suppressed ### Actual - Brief flash of `NO_R` or `NO_RE` etc. before the message disappears ## Evidence - [x] Failing test/log before + passing after - [ ] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) New test file `src/auto-reply/tokens.test.ts` covers `isSilentReplyPrefix` (prefix matching, whitespace trimming, non-matches, undefined input) and existing `isSilentReplyText` behavior. ## Human Verification (required) - **Verified scenarios:** Unit tests pass for prefix matching of both `NO_REPLY` and `HEARTBEAT_OK` tokens, including edge cases (empty string, whitespace, partial divergence like `NOPE`) - **Edge cases checked:** Single-char prefix `"N"` (matches, causes one-token delay — acceptable since normal responses diverge immediately), empty/whitespace input (returns false), full token as prefix of itself (returns true, consistent with `isSilentReplyText`) - **What you did not verify:** Live Telegram streaming (no live bot in test environment), `tsc` full build (timed out on Pi — lint + vitest passed) ## Compatibility / Migration - Backward compatible? `Yes` - Config/env changes? `No` - Migration needed? `No` ## Failure Recovery (if this breaks) - **How to disable/revert:** Revert this commit. Alternatively, disable streaming (`streamMode: "off"`) as a workaround. - **Files/config to restore:** `src/auto-reply/tokens.ts`, `src/auto-reply/reply/agent-runner-execution.ts`, `src/auto-reply/reply/streaming-directives.ts`, `src/auto-reply/reply/typing.ts` - **Known bad symptoms:** If prefix matching is too aggressive, normal responses starting with `N` or `H` could experience a one-token delay before the first stream flush. This is by design and should be imperceptible. ## Risks and Mitigations - **Risk:** Responses starting with `NO_` (e.g. `"NO_RULES apply here"`) are delayed by a few tokens before the prefix check diverges and flushes. - **Mitigation:** Divergence happens on the very next token after the prefix stops matching. Delay is sub-100ms in practice. Only affects the streaming layer — final delivery is unaffected. ## Appendix [Log of debug/fix conversation with LLM](https://github.com/user-attachments/files/25377942/silent-reply-chat-log.md)  <h3>Greptile Summary</h3> This PR adds streaming-time suppression of partial silent-reply tokens (`NO_REPLY`, `HEARTBEAT_OK`) to prevent ghost text from leaking to external messaging channels (e.g., Telegram). A new `isSilentReplyPrefix()` utility function in `tokens.ts` checks whether accumulated stream text is a prefix of a known silent token, and this check is wired into the three streaming code paths: `normalizeStreamingText` (agent runner), `parseChunk` (streaming directives), and `startTypingOnText` (typing controller). - The core `isSilentReplyPrefix` function is clean, well-tested, and handles edge cases (undefined, empty, whitespace). - The fix is applied consistently across the three streaming layers where partial tokens could leak. - `agent-runner-execution.ts` checks both `SILENT_REPLY_TOKEN` and `HEARTBEAT_TOKEN` prefixes, while `streaming-directives.ts` and `typing.ts` only check the configured `silentToken` (defaults to `SILENT_REPLY_TOKEN`). This is acceptable because: (1) heartbeat runs disable typing entirely via `resolveTypingMode`, and (2) `normalizeStreamingText` in the agent runner handles both tokens and wraps the lower-level paths. - New test file `tokens.test.ts` provides thorough coverage including incremental prefix matching, whitespace handling, divergence cases, and existing `isSilentReplyText` regression tests. - Non-streaming paths and final delivery logic remain unchanged. <h3>Confidence Score: 4/5</h3> - This PR is safe to merge — it adds defensive suppression for partial tokens during streaming with no changes to final delivery logic or configuration. - The changes are well-scoped to the streaming layer, the new utility function is simple and correct, tests cover the important cases, and the three integration points are applied consistently with the existing architecture. Score is 4 rather than 5 because the `streaming-directives.ts` `parseChunk` only checks `SILENT_REPLY_TOKEN` prefix (not `HEARTBEAT_TOKEN`), creating a minor inconsistency with `agent-runner-execution.ts` — though this is mitigated by the higher-level heartbeat handling. The PR was not verified with live Telegram streaming or a full `tsc` build. - `src/auto-reply/reply/streaming-directives.ts` — only checks `SILENT_REPLY_TOKEN` prefix, not `HEARTBEAT_TOKEN`, unlike the agent runner. <sub>Last reviewed commit: ec795c9</sub>