#14086: feat(tts,media): add base Sarvam TTS and STT providers

by kiranjd open 2026-02-11 13:06 View on GitHub →

gateway agents stale size: XL

Cluster: Voice Call and TTS Improvements

Human: This PR adds support for [Sarvam](https://www.sarvam.ai/) providers. This adds support for regional languages of india with SOTA accuracy. This makes OpenClaw accessible for non-english(majority of India) speakers Agent: ## Summary Add **base Sarvam provider support** for: - TTS (`messages.tts.provider = sarvam`) - Audio transcription (`tools.media.audio` via Sarvam provider) This PR is intentionally scoped to provider plumbing only. ## Included - Add `sarvam` to TTS provider enum/schema and wiring. - Add Sarvam key discovery via `SARVAM_API_KEY`. - Add Sarvam in TTS command and gateway provider surfaces. - Add Sarvam media-understanding audio provider (`speech-to-text-translate`). - Add auto-audio defaults for Sarvam (`saaras:v2.5`). - Add configurable Sarvam TTS target language (`messages.tts.sarvam.languageCode`, default `en-IN`). - Add focused tests for TTS provider handling and Sarvam STT request building/mime normalization. ## Explicitly out of scope - No auto-translation or source-language plumbing in reply pipeline. - No behavioral changes outside base TTS/STT provider support. ## Validation - `pnpm test src/tts/tts.test.ts src/media-understanding/providers/sarvam/audio.test.ts src/media-understanding/runner.auto-audio.test.ts` - `pnpm check`  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adds a new `sarvam` provider across both TTS and media-understanding audio transcription: config types/schemas are extended, env key discovery adds `SARVAM_API_KEY`, gateway/UI surfaces list Sarvam as an available TTS provider, and a new Sarvam audio transcription provider is wired into the media-understanding provider registry with tests. Main issues to fix before merge: - Sarvam audio transcription can crash when `fileName` is omitted due to `path.basename(params.fileName)` being called with `undefined`. - Sarvam TTS requests bypass the repo’s SSRF/pinned-DNS fetch guards while using a configurable `baseUrl`, which is inconsistent with the guarded fetch patterns used by other providers. - Telegram output metadata/behavior is internally inconsistent for Sarvam (declares `.opus`/voice-compatible but produces mp3 and marks voice-incompatible), likely breaking Telegram voice-note handling. <h3>Confidence Score: 2/5</h3> - This PR has a couple of concrete correctness/security issues that should be fixed before merging. - While the provider plumbing is largely consistent, there is a confirmed runtime crash path in Sarvam STT when `fileName` is missing, and Sarvam TTS currently bypasses the project’s SSRF/pinned-DNS protections while accepting a configurable base URL. There is also a confirmed Telegram output mismatch for Sarvam that will lead to incorrect output metadata/behavior. - src/media-understanding/providers/sarvam/audio.ts, src/tts/tts.ts