#8955: feat(tts): Add Kokoro-82M as first-class TTS provider
stale
Cluster:
Text-to-Speech Provider Enhancements
## Summary
Adds **Kokoro-82M** as a first-class TTS provider - the fastest local text-to-speech system available.
## Provider Overview
### Kokoro-82M TTS
- **Speed**: 35-100x realtime on CUDA GPUs (sub-0.3s for any text length)
- **Size**: Only 82 million parameters (tiny, efficient)
- **Quality**: Comparable to much larger models
- **Voices**: 67 voices across 8 languages (English, Japanese, Chinese, Hindi, Italian, French, Portuguese)
- **Voice Mixing**: Supports blending voices (e.g., `af_bella+jf_alpha`)
- **License**: Apache 2.0 (fully open)
- **OpenAI-compatible API**: Drop-in replacement for existing tools
## Configuration
**⚠️ Important:** Kokoro requires `enabled: true` to be explicitly set in the configuration, even when specified as the primary provider.
```json
{
"messages": {
"tts": {
"provider": "kokoro",
"kokoro": {
"enabled": true,
"baseUrl": "http://localhost:8102",
"voice": "af_bella"
}
}
}
}
```
## Auth Profiles
Like other first-class providers, Kokoro requires an auth profile entry in `~/.openclaw/agents/main/agent/auth-profiles.json`, even for local services:
```json
{
"profiles": {
"kokoro:local": {
"type": "token",
"provider": "kokoro",
"token": "not-needed"
}
}
}
```
Without this entry, TTS calls will silently fail with "No API key found" errors.
## Changes
- Added Kokoro to TtsProvider type
- Added Kokoro config block in TtsConfig with voice mixing support
- Added Zod schema validation for Kokoro
- Implemented `kokoroTTS()` function with OpenAI-compatible API calls
- Added Kokoro to provider fallback chain
- Added config resolution with defaults (localhost:8102, af_bella voice)
## Installation
Kokoro can be self-hosted using the [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) server:
```bash
git clone https://github.com/remsky/Kokoro-FastAPI.git
cd Kokoro-FastAPI
./start-gpu.sh # For CUDA support
# or
./start-cpu.sh # For CPU-only
```
## Testing
- Tested with local Kokoro server on CUDA GPU
- Verified voice message generation on Matrix
- Confirmed auth profile requirement
- Tested voice mixing feature
- Measured 35-100x realtime speed on RTX 5080
## Resources
- Model: https://huggingface.co/hexgrad/Kokoro-82M
- Server: https://github.com/remsky/Kokoro-FastAPI
- Benchmark: https://www.inferless.com/learn/comparing-different-text-to-speech---tts--models-part-2
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
Adds a new `kokoro` TTS provider across config types, Zod validation, and runtime TTS dispatch. The provider is resolved with defaults (localhost:8102, `af_bella`, speed=1.0), added to the provider fallback order, and implemented via an OpenAI-compatible `/v1/audio/speech` request that writes the returned audio buffer to a temp file for downstream message sending.
<h3>Confidence Score: 3/5</h3>
- This PR is close to mergeable but has a couple of runtime-behavior issues to address first.
- Kokoro integration is self-contained and follows existing provider patterns, but there are correctness issues around the new resolved config shape assumptions and audio compatibility flagging that can affect message sending behavior.
- src/tts/tts.ts
<!-- greptile_other_comments_section -->
**Context used:**
- Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))
- Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13))
<!-- /greptile_comment -->
Most Similar PRs
#7965: feat(tts): add Speechify as TTS provider
by chaerla · 2026-02-03
75.6%
#7258: feat(tts): add Inworld AI TTS provider
by willsinghwilson · 2026-02-02
72.9%
#20794: feat(tts): add Fish Audio provider with full docs, tests & gateway ...
by twangodev · 2026-02-19
70.4%
#12907: feat: add Gonka.ai as optional LLM provider with ECDSA signing
by ultragenez · 2026-02-09
69.7%
#22618: feat(tts): add OpenAI TTS speed parameter support
by useramuser · 2026-02-21
69.5%
#22086: fix(tts): honor explicit config provider and model/voice settings
by AIflow-Labs · 2026-02-20
69.4%
#19427: feat: add Soniox speech-to-text provider
by matjaz · 2026-02-17
68.7%
#10351: feat: Add Mumble voice chat extension
by emadomedher · 2026-02-06
68.4%
#10870: feat(tts): add pocket-tts provider for local CPU-based TTS
by fayrose · 2026-02-07
68.2%
#6677: fix(tts): always load fresh config for voice selection
by Jinqiao · 2026-02-01
68.2%