#14086: feat(tts,media): add base Sarvam TTS and STT providers
gateway
agents
stale
size: XL
Cluster:
Voice Call and TTS Improvements
Human:
This PR adds support for [Sarvam](https://www.sarvam.ai/) providers.
This adds support for regional languages of india with SOTA accuracy. This makes OpenClaw accessible for non-english(majority of India) speakers
Agent:
## Summary
Add **base Sarvam provider support** for:
- TTS (`messages.tts.provider = sarvam`)
- Audio transcription (`tools.media.audio` via Sarvam provider)
This PR is intentionally scoped to provider plumbing only.
## Included
- Add `sarvam` to TTS provider enum/schema and wiring.
- Add Sarvam key discovery via `SARVAM_API_KEY`.
- Add Sarvam in TTS command and gateway provider surfaces.
- Add Sarvam media-understanding audio provider (`speech-to-text-translate`).
- Add auto-audio defaults for Sarvam (`saaras:v2.5`).
- Add configurable Sarvam TTS target language (`messages.tts.sarvam.languageCode`, default `en-IN`).
- Add focused tests for TTS provider handling and Sarvam STT request building/mime normalization.
## Explicitly out of scope
- No auto-translation or source-language plumbing in reply pipeline.
- No behavioral changes outside base TTS/STT provider support.
## Validation
- `pnpm test src/tts/tts.test.ts src/media-understanding/providers/sarvam/audio.test.ts src/media-understanding/runner.auto-audio.test.ts`
- `pnpm check`
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR adds a new `sarvam` provider across both TTS and media-understanding audio transcription: config types/schemas are extended, env key discovery adds `SARVAM_API_KEY`, gateway/UI surfaces list Sarvam as an available TTS provider, and a new Sarvam audio transcription provider is wired into the media-understanding provider registry with tests.
Main issues to fix before merge:
- Sarvam audio transcription can crash when `fileName` is omitted due to `path.basename(params.fileName)` being called with `undefined`.
- Sarvam TTS requests bypass the repo’s SSRF/pinned-DNS fetch guards while using a configurable `baseUrl`, which is inconsistent with the guarded fetch patterns used by other providers.
- Telegram output metadata/behavior is internally inconsistent for Sarvam (declares `.opus`/voice-compatible but produces mp3 and marks voice-incompatible), likely breaking Telegram voice-note handling.
<h3>Confidence Score: 2/5</h3>
- This PR has a couple of concrete correctness/security issues that should be fixed before merging.
- While the provider plumbing is largely consistent, there is a confirmed runtime crash path in Sarvam STT when `fileName` is missing, and Sarvam TTS currently bypasses the project’s SSRF/pinned-DNS protections while accepting a configurable base URL. There is also a confirmed Telegram output mismatch for Sarvam that will lead to incorrect output metadata/behavior.
- src/media-understanding/providers/sarvam/audio.ts, src/tts/tts.ts
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#7965: feat(tts): add Speechify as TTS provider
by chaerla · 2026-02-03
76.4%
#12597: voice-call: add Asterisk ARI provider + core STT
by w0s1nsk1 · 2026-02-09
71.2%
#14208: feat(media): add AssemblyAI audio transcription provider
by jmoraispk · 2026-02-11
70.9%
#20794: feat(tts): add Fish Audio provider with full docs, tests & gateway ...
by twangodev · 2026-02-19
70.7%
#7258: feat(tts): add Inworld AI TTS provider
by willsinghwilson · 2026-02-02
70.0%
#19427: feat: add Soniox speech-to-text provider
by matjaz · 2026-02-17
69.7%
#7485: TTS: add Resemble AI provider support
by devshahofficial · 2026-02-02
69.5%
#21110: fix(tts): deliver audio via structured mediaUrl instead of MEDIA: t...
by hydro13 · 2026-02-19
69.4%
#19073: feat(voice-call): streaming TTS, barge-in, silence filler, hangup, ...
by odrobnik · 2026-02-17
69.1%
#23572: feat(voice): enable voice note conversation loop for Telegram and W...
by davidrudduck · 2026-02-22
69.0%