#19489: fix(voice-call): add echo suppression for TTS playback

by kalichkin open 2026-02-17 21:36 View on GitHub →

channel: voice-call size: XS

Cluster: Voice Call and TTS Improvements

## Problem During voice calls, the bot's own TTS audio leaks back through the caller's microphone and gets picked up by STT. This causes the agent to "hear" its own response, transcribe it, and respond again -- creating a feedback loop where the bot talks to itself. This is especially noticeable on speakerphone or in echoey environments, but happens to some degree on all calls due to Twilio's media stream including the mixed audio. ## Fix Mute STT input during TTS playback, plus a 250ms cooldown after TTS finishes to account for network-delayed audio still reaching the microphone. Three changes to `media-stream.ts`: 1. New `ttsCooldownUntil` map + `TTS_COOLDOWN_MS` constant (250ms) 2. In the `media` message handler: skip forwarding audio to STT when TTS is active or within cooldown window 3. Set cooldown timestamp when TTS queue drains; clean up maps on stream stop ## Trade-off The caller's speech during TTS playback + 250ms after is discarded. In practice this is fine -- the caller is listening to the bot speak, not talking over it. The 250ms cooldown is short enough to not clip the start of the caller's next utterance. ## Testing Tested on OpenClaw 2026.2.9 through 2026.2.15 with Twilio voice calls. Before the fix, the bot would frequently respond to echoes of its own TTS. After the fix, echo loops are eliminated.  <h3>Greptile Summary</h3> Adds echo suppression to prevent TTS audio feedback loops by muting STT input while TTS is playing and for 250ms after TTS finishes. The implementation correctly handles the common case of sequential TTS playback. - Introduced `ttsCooldownUntil` map to track per-stream cooldown timestamps - Modified media message handler to skip STT forwarding during TTS playback or cooldown - Sets cooldown when TTS queue drains naturally in `processQueue` - Cleanup cooldown map in `clearTtsState` **Issue found:** `clearTtsQueue` (used for barge-in interruptions) doesn't set the cooldown timestamp, allowing echo from interrupted TTS to leak through since network-delayed audio can still arrive after the interruption. <h3>Confidence Score: 3/5</h3> - Safe to merge with one logical issue in barge-in edge case - The core echo suppression logic is sound and handles the main use case correctly. However, the barge-in path (`clearTtsQueue`) doesn't set the cooldown, leaving a gap where echo from interrupted TTS can leak through. This is an edge case that won't affect normal sequential TTS but could cause issues when users interrupt the bot. - Pay attention to the barge-in behavior in `clearTtsQueue` - test interrupting TTS mid-speech <sub>Last reviewed commit: f19907f</sub>  <sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>