#8428: voice-call: add low-latency streaming infrastructure

by damianhodgkiss open 2026-02-04 01:32 View on GitHub →

channel: voice-call stale

Cluster: Voice Call Enhancements and Fixes

## Summary Major improvements to the voice call extension for **faster response times**. Previously, users had to wait for the entire LLM response before hearing anything. Now, audio starts playing as soon as the AI generates its first sentence. ## What's New ### Faster Speech-to-Text (Deepgram Flux) - **Model-based end-of-turn detection**: Instead of waiting for silence (VAD), the model predicts when you've finished speaking - **Speculative processing**: Starts generating a response *before* you finish speaking, then uses it immediately if the prediction was correct - **Native telephony audio**: Accepts mu-law 8kHz directly from Twilio (no conversion overhead) ### Faster Text-to-Speech (Cartesia) - **Persistent WebSocket connection**: Eliminates per-request connection overhead - **Native mu-law output**: No PCM→mu-law conversion needed - **Streaming chunks**: Audio starts playing while still being generated ### Streaming LLM → TTS Pipeline - **Sentence-by-sentence delivery**: First sentence plays while LLM generates the rest - **Barge-in support**: Interrupt the AI mid-response by speaking - **Graceful cancellation**: If you continue speaking after an early prediction, speculative work is cancelled ## Before vs After ``` BEFORE: You stop speaking → Silence detection → Full LLM response → TTS → Audio (You wait for the entire response before hearing anything) AFTER: You stop speaking → Model detects end → First sentence → Audio (Audio starts immediately, rest streams in parallel) ``` ## Configuration ```yaml plugins: entries: voice-call: config: streaming: enabled: true sttProvider: "deepgram-flux" # or "openai-realtime" deepgramApiKey: "..." tts: provider: "cartesia" cartesia: apiKey: "..." voiceId: "..." ``` ## Test plan - [ ] Test call with Deepgram Flux STT - [ ] Test call with Cartesia TTS - [ ] Verify sentence streaming (ask for something long like "explain quantum computing") - [ ] Test barge-in during response (interrupt the AI) - [ ] Test EagerEndOfTurn speculation (pause briefly mid-sentence) 🤖 Generated with [Claude Code](https://claude.ai/code)