#20794: feat(tts): add Fish Audio provider with full docs, tests & gateway support

by twangodev open 2026-02-19 09:32 View on GitHub →

docs channel: voice-call gateway size: M

Cluster: Text-to-Speech Provider Enhancements

## Summary Describe the problem and fix in 2–5 bullets: - Problem: OpenClaw currently only supports Elevenlabs, OpenAI, and Edge TTS providers. - Why it matters: [Fish Audio S1](https://fish.audio/) has excellent multilingual voice cloning performance at a pretty good price. - What changed: Add `fishaudio` as a core TTS provider with documentation to match. - What did NOT change (scope boundary): No changes were made to how existing TTS providers function. ## Change Type (select all) - [ ] Bug fix - [x] Feature - [x] Refactor - [x] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [x] API / contracts - [x] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes # - Related # ## User-visible / Behavior Changes - New TTS provider `fishaudio` available via `messages.tts.provider: "fishaudio"` in `openclaw.json` - New env var `FISH_API_KEY` (or config fishaudio.apiKey) for authentication - `/tts provider fishaudio` slash command now accepted - `tts.setProvider` gateway RPC accepts "fishaudio" - `tts.status` response now includes hasFishAudioKey - `tts.providers` response now lists Fish Audio - Auto-detection fallback order: openai > elevenlabs > fishaudio > edge - Model-driven TTS directives support [[tts:provider=fishaudio voiceId=...]] ## Security Impact (required) - New permissions/capabilities? No - Secrets/tokens handling changed? Yes — new `FISH_API_KEY` env var and `fishaudio.apiKey` config field (marked sensitive in Zod schema) - New/changed network calls? Yes — outbound POST to https://api.fish.audio/v1/tts (configurable via fishaudio.baseUrl) - Command/tool execution surface changed? No - Data access scope changed? No - If any Yes, explain risk + mitigation: - The new API key follows the same pattern as existing ElevenLabs/OpenAI keys — stored in config (marked sensitive) or env var, sent only via Authorization: Bearer header to the configured Fish Audio endpoint. - The baseUrl is user-configurable but defaults to the official API. No SSRF risk beyond existing TTS providers since it uses the same fetch path. - New permissions/capabilities? No - Secrets/tokens handling changed? Yes, new `FISH_API_KEY` - New/changed network calls? (`Yes/No`) - Command/tool execution surface changed? (`Yes/No`) - Data access scope changed? (`Yes/No`) - If any `Yes`, explain risk + mitigation: ## Repro + Verification ### Environment - OS: macOS (Darwin 25.4.0) - Runtime/container: Node.js - Model/provider: N/A (TTS provider, not LLM) - Integration/channel (if any): All channels (Telegram, Discord, etc.) - Relevant config (redacted): `{ messages: { tts: { provider: "fishaudio", fishaudio: { apiKey: "***", voiceId: "***" } } } }` ### Steps 1. Set FISH_API_KEY env var or configure `fishaudio.apiKey` in `openclaw.json` 2. Set `messages.tts.provider: "fishaudio"` or run `/tts provider fishaudio` 3. Enable TTS with `/tts always` 4. Send a message that triggers a reply ### Expected Audio ### Actual Audio ## Evidence Attach at least one: - [ ] Failing test/log before + passing after - [ ] Trace/log snippets - [x] Screenshot/recording - [ ] Perf numbers (if relevant) https://github.com/user-attachments/assets/519363fe-ce54-4954-b8fb-b1e34bad89d2 ## Human Verification (required) What you personally verified (not just CI), and how: - Verified scenarios: Audio is actually generated when a /tts command is run, or naturally when the chatbot requests media to be generated - Edge cases checked: Long text is summarized, short text does not generate (original behavior with other TTS providers) - What you did **not** verify: API failures. I assume it will run through the fallback options. ## Compatibility / Migration - Backward compatible? Yes - Config/env changes? Yes - Migration needed? No - If yes, exact upgrade steps: ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: Ignore config options. - Files/config to restore: Whole PR should be revertable. - Known bad symptoms reviewers should watch for: ## Risks and Mitigations List only real risks for this PR. Add/remove entries as needed. If none, write `None`. - Risk: Fish Audio API changes or endpoint unavailability - Mitigation: Provider fallback chain automatically tries next configured provider (ElevenLabs/OpenAI/Edge); baseUrl is configurable for custom endpoints - Risk: voiceId rename could break existing configs that used referenceId - Mitigation: This is a new provider shipping for the first time — no existing configs use referenceId in the wild yet _Disclosure: I'm a founding engineer at [Fish Audio](https://fish.audio/). Me and my team will be happy to own ongoing maintenance for this integration._  <h3>Greptile Summary</h3> Adds Fish Audio as a fourth core TTS provider alongside OpenAI, ElevenLabs, and Edge TTS. The implementation is comprehensive and follows existing patterns consistently: - Core API integration with proper timeout handling, error handling, and abort controller usage - Full configuration schema with sensitive field marking for `apiKey` - Gateway RPC methods updated (`tts.status`, `tts.setProvider`, `tts.providers`) - Command handlers and auto-detection fallback chain (openai > elevenlabs > fishaudio > edge) - TTS directive parsing supports `provider=fishaudio` and voice ID overrides - Comprehensive test coverage (6 new test cases) - Complete documentation updates in both English and Chinese - CHANGELOG entry added The implementation correctly handles: - API key resolution from config and `FISH_API_KEY` env var - Output format selection for different contexts (Telegram opus, default mp3, telephony PCM) - Voice ID overrides via directives - Provider fallback when Fish Audio fails <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with no blocking issues - The implementation is thorough, follows established patterns perfectly, includes comprehensive test coverage (6 new tests), and properly handles security concerns (API key marked sensitive, proper timeout/abort handling). All integration points are updated consistently across commands, gateway methods, and documentation. - No files require special attention <sub>Last reviewed commit: 7905ecd</sub>  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>