#10447: feat(voice-call): add Deepgram STT provider

by chrharri open 2026-02-06 14:28 View on GitHub →

channel: voice-call stale

Cluster: Text-to-Speech Provider Enhancements

## Summary Adds Deepgram as an alternative STT provider for real-time speech-to-text transcription in voice calls. ## Features - New `DeepgramSTTProvider` with WebSocket streaming support - Nova-2 model by default with built-in VAD and utterance detection - Config options: `sttProvider`, `deepgramApiKey`, `deepgramModel`, `deepgramLanguage`, `utteranceEndMs` - Environment variable support (`DEEPGRAM_API_KEY`) - Automatic reconnection with exponential backoff - Partial transcript callbacks for streaming UI ## Usage ```json5 { streaming: { enabled: true, sttProvider: "deepgram", deepgramApiKey: "your_key", // or use DEEPGRAM_API_KEY env deepgramModel: "nova-2", deepgramLanguage: "en-US", utteranceEndMs: 1500, } } ``` ## Why Deepgram? - **Real-time streaming**: Low-latency WebSocket-based transcription - **Accuracy**: Nova-2 is highly accurate across accents and domains - **Built-in VAD**: Server-side voice activity detection with configurable utterance end detection - **Cost-effective**: Often more affordable than OpenAI Realtime API for high-volume use cases ## Changes - `extensions/voice-call/src/providers/stt-deepgram.ts`: New Deepgram provider implementation - `extensions/voice-call/src/config.ts`: Added Deepgram config options to streaming schema - `extensions/voice-call/src/webhook.ts`: Support for selecting STT provider - `extensions/voice-call/src/media-stream.ts`: Unified STT session types - `extensions/voice-call/src/providers/index.ts`: Export new provider - `extensions/voice-call/README.md`: Documentation updates - `extensions/voice-call/CHANGELOG.md`: Changelog entry  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> - Adds a new `DeepgramSTTProvider` (WebSocket streaming) and exposes it via `extensions/voice-call/src/providers/index.ts`. - Extends `streaming` config schema to support selecting `sttProvider` and Deepgram-specific options / env var (`DEEPGRAM_API_KEY`). - Updates voice-call webhook server to instantiate either OpenAI Realtime STT or Deepgram based on config. - Updates media stream handler typing to accept either STT provider/session and documents the new provider options in README/CHANGELOG. <h3>Confidence Score: 3/5</h3> - This PR is close to mergeable, but the Deepgram STT session API has a callback-handling bug that can silently disable transcript delivery in some call patterns. - The changes are mostly additive and integrate cleanly into the existing provider selection flow, but `DeepgramSTTSessionImpl.waitForTranscript()` mutates the shared `onTranscriptCallback`, which can clobber previously registered transcript handlers and cause transcripts to stop reaching the rest of the system if `waitForTranscript()` is ever used alongside `onTranscript()`. - extensions/voice-call/src/providers/stt-deepgram.ts  <sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>