#8848: feat(stt): Add Whisper as first-class audio transcription provider
stale
Cluster:
Voice Transcription Enhancements
## Summary
Adds Whisper as a first-class STT provider for automatic voice message transcription.
## Features
- Automatic transcription of voice messages from Matrix/Telegram/Discord
- OpenAI-compatible Whisper API (local or remote)
- Integrates with media understanding system
## Configuration
### 1. Enable in config:
```json
{
"tools": {
"media": {
"audio": {
"enabled": true,
"models": [{
"provider": "whisper",
"model": "whisper-1",
"baseUrl": "http://localhost:8200/v1"
}]
}
}
}
}
```
### 2. CRITICAL: Add auth profile
```json
// ~/.openclaw/agents/main/agent/auth-profiles.json
{
"whisper:local": {
"type": "token",
"provider": "whisper",
"token": "not-needed"
}
}
```
**Auth profile required even for local servers without authentication!**
## Tested
- Local Whisper server (faster-whisper)
- Matrix voice messages
- Error handling and fallback
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR adds a new `whisper` media-understanding provider and registers it alongside existing providers. The provider delegates audio transcription to the existing OpenAI-compatible transcription implementation, overriding defaults for `baseUrl` (local whisper server) and `model` ("whisper-1").
Integration-wise, it plugs into the same provider registry (`src/media-understanding/providers/index.ts`) used for model selection and media transcription, so selecting `provider: "whisper"` in config routes audio transcription through the shared OpenAI-compatible request path.
<h3>Confidence Score: 4/5</h3>
- Mostly safe to merge, but default auth header behavior may break some self-hosted Whisper servers.
- Changes are small and reuse an existing, tested OpenAI-compatible transcription path. The main concern is the new Whisper wrapper forcing an Authorization header via a default `apiKey`, which can cause real interoperability failures for servers that don’t expect/allow auth headers (and matches the PR note that auth profiles are required even when auth isn’t).
- src/media-understanding/providers/whisper/audio.ts
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#11334: feat: add Mistral/Voxtral audio transcription provider
by JamesEBall · 2026-02-07
76.5%
#9703: feat(macos): Voice settings restructure + Whisper transcription sup...
by nsd97 · 2026-02-05
76.4%
#14208: feat(media): add AssemblyAI audio transcription provider
by jmoraispk · 2026-02-11
73.6%
#12717: fix: add "audio" to openai provider capabilities
by openjay · 2026-02-09
73.2%
#19427: feat: add Soniox speech-to-text provider
by matjaz · 2026-02-17
70.4%
#10447: feat(voice-call): add Deepgram STT provider
by chrharri · 2026-02-06
69.9%
#8388: fix(media): auto-skip tiny/empty audio files before transcription (...
by Glucksberg · 2026-02-04
69.8%
#7258: feat(tts): add Inworld AI TTS provider
by willsinghwilson · 2026-02-02
69.7%
#23572: feat(voice): enable voice note conversation loop for Telegram and W...
by davidrudduck · 2026-02-22
69.3%
#7965: feat(tts): add Speechify as TTS provider
by chaerla · 2026-02-03
68.7%