#8848: feat(stt): Add Whisper as first-class audio transcription provider

by emadomedher open 2026-02-04 13:55 View on GitHub →

stale

Cluster: Voice Transcription Enhancements

## Summary Adds Whisper as a first-class STT provider for automatic voice message transcription. ## Features - Automatic transcription of voice messages from Matrix/Telegram/Discord - OpenAI-compatible Whisper API (local or remote) - Integrates with media understanding system ## Configuration ### 1. Enable in config: ```json { "tools": { "media": { "audio": { "enabled": true, "models": [{ "provider": "whisper", "model": "whisper-1", "baseUrl": "http://localhost:8200/v1" }] } } } } ``` ### 2. CRITICAL: Add auth profile ```json // ~/.openclaw/agents/main/agent/auth-profiles.json { "whisper:local": { "type": "token", "provider": "whisper", "token": "not-needed" } } ``` **Auth profile required even for local servers without authentication!** ## Tested - Local Whisper server (faster-whisper) - Matrix voice messages - Error handling and fallback  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adds a new `whisper` media-understanding provider and registers it alongside existing providers. The provider delegates audio transcription to the existing OpenAI-compatible transcription implementation, overriding defaults for `baseUrl` (local whisper server) and `model` ("whisper-1"). Integration-wise, it plugs into the same provider registry (`src/media-understanding/providers/index.ts`) used for model selection and media transcription, so selecting `provider: "whisper"` in config routes audio transcription through the shared OpenAI-compatible request path. <h3>Confidence Score: 4/5</h3> - Mostly safe to merge, but default auth header behavior may break some self-hosted Whisper servers. - Changes are small and reuse an existing, tested OpenAI-compatible transcription path. The main concern is the new Whisper wrapper forcing an Authorization header via a default `apiKey`, which can cause real interoperability failures for servers that don’t expect/allow auth headers (and matches the PR note that auth profiles are required even when auth isn’t). - src/media-understanding/providers/whisper/audio.ts