#10447: feat(voice-call): add Deepgram STT provider
channel: voice-call
stale
Cluster:
Text-to-Speech Provider Enhancements
## Summary
Adds Deepgram as an alternative STT provider for real-time speech-to-text transcription in voice calls.
## Features
- New `DeepgramSTTProvider` with WebSocket streaming support
- Nova-2 model by default with built-in VAD and utterance detection
- Config options: `sttProvider`, `deepgramApiKey`, `deepgramModel`, `deepgramLanguage`, `utteranceEndMs`
- Environment variable support (`DEEPGRAM_API_KEY`)
- Automatic reconnection with exponential backoff
- Partial transcript callbacks for streaming UI
## Usage
```json5
{
streaming: {
enabled: true,
sttProvider: "deepgram",
deepgramApiKey: "your_key", // or use DEEPGRAM_API_KEY env
deepgramModel: "nova-2",
deepgramLanguage: "en-US",
utteranceEndMs: 1500,
}
}
```
## Why Deepgram?
- **Real-time streaming**: Low-latency WebSocket-based transcription
- **Accuracy**: Nova-2 is highly accurate across accents and domains
- **Built-in VAD**: Server-side voice activity detection with configurable utterance end detection
- **Cost-effective**: Often more affordable than OpenAI Realtime API for high-volume use cases
## Changes
- `extensions/voice-call/src/providers/stt-deepgram.ts`: New Deepgram provider implementation
- `extensions/voice-call/src/config.ts`: Added Deepgram config options to streaming schema
- `extensions/voice-call/src/webhook.ts`: Support for selecting STT provider
- `extensions/voice-call/src/media-stream.ts`: Unified STT session types
- `extensions/voice-call/src/providers/index.ts`: Export new provider
- `extensions/voice-call/README.md`: Documentation updates
- `extensions/voice-call/CHANGELOG.md`: Changelog entry
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
- Adds a new `DeepgramSTTProvider` (WebSocket streaming) and exposes it via `extensions/voice-call/src/providers/index.ts`.
- Extends `streaming` config schema to support selecting `sttProvider` and Deepgram-specific options / env var (`DEEPGRAM_API_KEY`).
- Updates voice-call webhook server to instantiate either OpenAI Realtime STT or Deepgram based on config.
- Updates media stream handler typing to accept either STT provider/session and documents the new provider options in README/CHANGELOG.
<h3>Confidence Score: 3/5</h3>
- This PR is close to mergeable, but the Deepgram STT session API has a callback-handling bug that can silently disable transcript delivery in some call patterns.
- The changes are mostly additive and integrate cleanly into the existing provider selection flow, but `DeepgramSTTSessionImpl.waitForTranscript()` mutates the shared `onTranscriptCallback`, which can clobber previously registered transcript handlers and cause transcripts to stop reaching the rest of the system if `waitForTranscript()` is ever used alongside `onTranscript()`.
- extensions/voice-call/src/providers/stt-deepgram.ts
<!-- greptile_other_comments_section -->
<sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#7965: feat(tts): add Speechify as TTS provider
by chaerla · 2026-02-03
75.3%
#5499: fix(voice-call): wait for session creation before sending config up...
by lailoo · 2026-01-31
74.4%
#19073: feat(voice-call): streaming TTS, barge-in, silence filler, hangup, ...
by odrobnik · 2026-02-17
73.1%
#12597: voice-call: add Asterisk ARI provider + core STT
by w0s1nsk1 · 2026-02-09
73.1%
#11965: feat(ui): add speech-to-text dictation to web chat via Deepgram Flux
by billgetman · 2026-02-08
73.0%
#14393: feat: add standalone DeepSeek provider support
by osoulmate · 2026-02-12
72.6%
#14208: feat(media): add AssemblyAI audio transcription provider
by jmoraispk · 2026-02-11
71.7%
#8922: feat(voice-call): Add ElevenLabs WebSocket streaming TTS
by mikiships · 2026-02-04
71.0%
#11151: feat: add DeepSeek provider support
by MackDing · 2026-02-07
70.9%
#7113: feat(providers): add CommonStack provider support
by flhoildy · 2026-02-02
70.7%