#20794: feat(tts): add Fish Audio provider with full docs, tests & gateway support
docs
channel: voice-call
gateway
size: M
Cluster:
Text-to-Speech Provider Enhancements
## Summary
Describe the problem and fix in 2–5 bullets:
- Problem: OpenClaw currently only supports Elevenlabs, OpenAI, and Edge TTS providers.
- Why it matters: [Fish Audio S1](https://fish.audio/) has excellent multilingual voice cloning performance at a pretty good price.
- What changed: Add `fishaudio` as a core TTS provider with documentation to match.
- What did NOT change (scope boundary): No changes were made to how existing TTS providers function.
## Change Type (select all)
- [ ] Bug fix
- [x] Feature
- [x] Refactor
- [x] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [x] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [ ] Memory / storage
- [x] Integrations
- [x] API / contracts
- [x] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Closes #
- Related #
## User-visible / Behavior Changes
- New TTS provider `fishaudio` available via `messages.tts.provider: "fishaudio"` in `openclaw.json`
- New env var `FISH_API_KEY` (or config fishaudio.apiKey) for authentication
- `/tts provider fishaudio` slash command now accepted
- `tts.setProvider` gateway RPC accepts "fishaudio"
- `tts.status` response now includes hasFishAudioKey
- `tts.providers` response now lists Fish Audio
- Auto-detection fallback order: openai > elevenlabs > fishaudio > edge
- Model-driven TTS directives support [[tts:provider=fishaudio voiceId=...]]
## Security Impact (required)
- New permissions/capabilities? No
- Secrets/tokens handling changed? Yes — new `FISH_API_KEY` env var and `fishaudio.apiKey` config
field (marked sensitive in Zod schema)
- New/changed network calls? Yes — outbound POST to https://api.fish.audio/v1/tts (configurable via
fishaudio.baseUrl)
- Command/tool execution surface changed? No
- Data access scope changed? No
- If any Yes, explain risk + mitigation:
- The new API key follows the same pattern as existing ElevenLabs/OpenAI keys — stored in config (marked sensitive) or env var, sent only via Authorization: Bearer header to the configured Fish Audio endpoint.
- The baseUrl is user-configurable but defaults to the official API. No SSRF risk beyond existing TTS providers since it uses the same fetch path.
- New permissions/capabilities? No
- Secrets/tokens handling changed? Yes, new `FISH_API_KEY`
- New/changed network calls? (`Yes/No`)
- Command/tool execution surface changed? (`Yes/No`)
- Data access scope changed? (`Yes/No`)
- If any `Yes`, explain risk + mitigation:
## Repro + Verification
### Environment
- OS: macOS (Darwin 25.4.0)
- Runtime/container: Node.js
- Model/provider: N/A (TTS provider, not LLM)
- Integration/channel (if any): All channels (Telegram, Discord, etc.)
- Relevant config (redacted):
`{ messages: { tts: { provider: "fishaudio", fishaudio: { apiKey: "***", voiceId: "***" } } } }`
### Steps
1. Set FISH_API_KEY env var or configure `fishaudio.apiKey` in `openclaw.json`
2. Set `messages.tts.provider: "fishaudio"` or run `/tts provider fishaudio`
3. Enable TTS with `/tts always`
4. Send a message that triggers a reply
### Expected
Audio
### Actual
Audio
## Evidence
Attach at least one:
- [ ] Failing test/log before + passing after
- [ ] Trace/log snippets
- [x] Screenshot/recording
- [ ] Perf numbers (if relevant)
https://github.com/user-attachments/assets/519363fe-ce54-4954-b8fb-b1e34bad89d2
## Human Verification (required)
What you personally verified (not just CI), and how:
- Verified scenarios: Audio is actually generated when a /tts command is run, or naturally when the chatbot requests media to be generated
- Edge cases checked: Long text is summarized, short text does not generate (original behavior with other TTS providers)
- What you did **not** verify: API failures. I assume it will run through the fallback options.
## Compatibility / Migration
- Backward compatible? Yes
- Config/env changes? Yes
- Migration needed? No
- If yes, exact upgrade steps:
## Failure Recovery (if this breaks)
- How to disable/revert this change quickly: Ignore config options.
- Files/config to restore: Whole PR should be revertable.
- Known bad symptoms reviewers should watch for:
## Risks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write `None`.
- Risk: Fish Audio API changes or endpoint unavailability
- Mitigation: Provider fallback chain automatically tries next configured provider (ElevenLabs/OpenAI/Edge); baseUrl is configurable for custom endpoints
- Risk: voiceId rename could break existing configs that used referenceId
- Mitigation: This is a new provider shipping for the first time — no existing configs use referenceId in the wild yet
_Disclosure: I'm a founding engineer at [Fish Audio](https://fish.audio/). Me and my team will be happy to own ongoing maintenance for this integration._
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Adds Fish Audio as a fourth core TTS provider alongside OpenAI, ElevenLabs, and Edge TTS. The implementation is comprehensive and follows existing patterns consistently:
- Core API integration with proper timeout handling, error handling, and abort controller usage
- Full configuration schema with sensitive field marking for `apiKey`
- Gateway RPC methods updated (`tts.status`, `tts.setProvider`, `tts.providers`)
- Command handlers and auto-detection fallback chain (openai > elevenlabs > fishaudio > edge)
- TTS directive parsing supports `provider=fishaudio` and voice ID overrides
- Comprehensive test coverage (6 new test cases)
- Complete documentation updates in both English and Chinese
- CHANGELOG entry added
The implementation correctly handles:
- API key resolution from config and `FISH_API_KEY` env var
- Output format selection for different contexts (Telegram opus, default mp3, telephony PCM)
- Voice ID overrides via directives
- Provider fallback when Fish Audio fails
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with no blocking issues
- The implementation is thorough, follows established patterns perfectly, includes comprehensive test coverage (6 new tests), and properly handles security concerns (API key marked sensitive, proper timeout/abort handling). All integration points are updated consistently across commands, gateway methods, and documentation.
- No files require special attention
<sub>Last reviewed commit: 7905ecd</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#7965: feat(tts): add Speechify as TTS provider
by chaerla · 2026-02-03
81.5%
#7258: feat(tts): add Inworld AI TTS provider
by willsinghwilson · 2026-02-02
77.9%
#8922: feat(voice-call): Add ElevenLabs WebSocket streaming TTS
by mikiships · 2026-02-04
76.9%
#22086: fix(tts): honor explicit config provider and model/voice settings
by AIflow-Labs · 2026-02-20
76.1%
#19073: feat(voice-call): streaming TTS, barge-in, silence filler, hangup, ...
by odrobnik · 2026-02-17
75.1%
#13389: feat(telegram): support native voice notes with automatic OGG/Opus ...
by leavingme · 2026-02-10
74.2%
#7485: TTS: add Resemble AI provider support
by devshahofficial · 2026-02-02
74.2%
#11745: ui: add server-side TTS for web chat via gateway endpoint
by wjlgatech · 2026-02-08
73.8%
#21193: fix(tts): send voice messages as Opus bubbles on Telegram
by aris-katkova · 2026-02-19
73.4%
#8317: fix(tts): add dynamic timeout and retry logic for ElevenLabs TTS
by camtang26 · 2026-02-03
73.4%