#11965: feat(ui): add speech-to-text dictation to web chat via Deepgram Flux
docs
app: web-ui
gateway
stale
Cluster:
Voice Transcription Enhancements
## Summary
- Add real-time speech-to-text dictation to the web chat compose area using Deepgram's Flux model
- Gateway proxies browser audio to Deepgram, keeping API keys server-side
- Feature auto-enables when `DEEPGRAM_API_KEY` env var is set — zero config otherwise
### Architecture
```
Browser mic → AudioWorklet (PCM 16kHz) → Gateway WS (/dictation) → Deepgram v2/listen → Transcripts → Textarea
```
### What is Deepgram Flux?
[Deepgram](https://deepgram.com) is a speech-to-text API provider (similar to Google Speech, AWS Transcribe). **Flux** is their conversational model with ~260ms end-of-turn detection — it knows when the speaker has finished a thought and signals `EndOfTurn`, which we use to auto-stop recording.
### How it works
1. **Gateway** (`server-dictation.ts`): WebSocket upgrade handler at `/dictation` that proxies raw PCM audio to Deepgram's streaming API and returns transcript JSON. Requires `DEEPGRAM_API_KEY` in the environment.
2. **Browser client** (`dictation.ts`): `DictationClient` class that captures mic audio via an `AudioWorklet` (16kHz mono PCM), streams it over the gateway WebSocket, and dispatches transcript callbacks.
3. **UI integration** (`app.ts`, `views/chat.ts`): Mic button in compose area, `Cmd/Ctrl+Shift+D` keyboard shortcut, recording visual indicators, mic permission modal, and textarea population from transcripts.
### Feature detection
- Gateway advertises `dictation: true` in the hello response when `DEEPGRAM_API_KEY` is configured
- Browser checks `navigator.mediaDevices` and `AudioWorklet` support
- Mic button only appears when both sides are ready
### Difference from PR #10012
PR #10012 ("Webui voice") uses the browser-native `SpeechRecognition` API. This PR takes a different approach:
| | #10012 (Browser native) | This PR (Deepgram Flux) |
|---|---|---|
| Engine | Browser `SpeechRecognition` | Deepgram Flux via gateway proxy |
| Browser support | Chrome/Edge only | Any browser with `AudioWorklet` |
| End-of-turn | Browser-dependent | ~260ms Flux detection |
| API key | None needed | `DEEPGRAM_API_KEY` on gateway |
| Privacy | Audio sent to browser vendor | Audio sent to Deepgram via gateway |
### No new dependencies
All implementation uses built-in Web APIs (`AudioWorklet`, `MediaDevices`, `WebSocket`) and the existing gateway WebSocket infrastructure. No new npm packages.
## Files changed
**Gateway (new + modified):**
- `src/gateway/server-dictation.ts` — WebSocket proxy to Deepgram (NEW)
- `src/gateway/server-dictation.test.ts` — tests (NEW)
- `src/gateway/server-http.ts` — register upgrade handler
- `src/gateway/server-runtime-state.ts` — create handler
- `src/gateway/server.impl.ts` — add dictation logger
- `src/gateway/server/ws-connection/message-handler.ts` — feature flag in hello
**Browser client (new + modified):**
- `ui/src/ui/dictation.ts` — browser dictation client (NEW)
- `ui/src/ui/dictation.test.ts` — tests (NEW)
- `ui/src/ui/audio-worklet-processor.ts` — AudioWorklet PCM capture (NEW)
- `ui/src/ui/components/mic-permission-modal.ts` — permission modal (NEW)
- `ui/src/ui/icons.ts` — mic SVG icon
- `ui/src/styles/chat/dictation.css` — recording animations (NEW)
- `ui/src/styles/chat.css` — import dictation styles
- `ui/src/ui/gateway.ts` — dictation field in hello type
**UI integration (modified):**
- `ui/src/ui/app.ts` — state, handlers, Cmd+Shift+D shortcut
- `ui/src/ui/app-gateway.ts` — feature detection on connect
- `ui/src/ui/app-render.ts` — pass dictation props
- `ui/src/ui/views/chat.ts` — mic button, recording UI
**Docs:**
- `docs/plans/2026-02-07-dictation-design.md` — design document
- `docs/plans/2026-02-07-dictation-impl.md` — implementation plan
- `CHANGELOG.md` — added entry
## Test plan
- [x] `pnpm build` passes
- [x] `pnpm check` (lint + format) passes
- [x] `pnpm test` passes (249 tests, including new dictation tests)
- [x] Manual test: mic button appears when `DEEPGRAM_API_KEY` is set
- [x] Manual test: recording starts/stops via button and keyboard shortcut
- [x] Manual test: transcribed text populates the compose textarea
- [ ] Test without `DEEPGRAM_API_KEY` — mic button should not appear
- [ ] Test in Firefox (AudioWorklet support)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
Adds a new browser dictation client that captures 16kHz PCM via an AudioWorklet and streams it to a new gateway WebSocket upgrade endpoint (`/dictation/stream`), which proxies audio to Deepgram’s streaming API and forwards transcript JSON back to the UI. The chat compose view gains a mic button, keyboard shortcut, interim “Listening…” placeholder, and a permission-help modal; the gateway hello response now advertises `features.dictation` when `DEEPGRAM_API_KEY` is configured so the UI can feature-detect availability.
<h3>Confidence Score: 3/5</h3>
- Reasonably safe to merge once the two functional issues below are addressed.
- Core wiring is straightforward and tests pass, but there are two real behavioral problems introduced: (1) gateway-side unbounded buffering of audio while waiting for Deepgram, which can cause memory growth on bad upstream connections, and (2) the UI currently renders the mic button even when dictation isn’t actually available because `dictationEnabled` defaults to undefined/true-ish rendering logic.
- src/gateway/server-dictation.ts, ui/src/ui/views/chat.ts
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#10012: Webui voice
by nanxiacc · 2026-02-06
76.6%
#23778: feat: chat UI facelift — speech, themes, config categories, and polish
by BunsDev · 2026-02-22
75.8%
#11745: ui: add server-side TTS for web chat via gateway endpoint
by wjlgatech · 2026-02-08
75.3%
#10447: feat(voice-call): add Deepgram STT provider
by chrharri · 2026-02-06
73.0%
#16733: fix(ui): avoid injected newlines when tool output is hidden
by jp117 · 2026-02-15
72.1%
#20155: feat(telegram): add tg-network-guard transcript status + reply flow
by artemgetmann · 2026-02-18
71.9%
#7965: feat(tts): add Speechify as TTS provider
by chaerla · 2026-02-03
70.9%
#9218: Fix Control UI chat resync on gaps and terminal events
by figitaki · 2026-02-05
70.7%
#12157: feat(macos): add Granola-style meeting notes with live transcription
by npow · 2026-02-08
70.7%
#23572: feat(voice): enable voice note conversation loop for Telegram and W...
by davidrudduck · 2026-02-22
70.3%