#9041: feat(tts): Add post-processing hook for voice modulation
docs
stale
Cluster:
Voice Call and TTS Improvements
# Add TTS Post-Processing Hook for Voice Modulation
## Summary
Adds a configurable post-processing hook to `messages.tts` that allows audio manipulation (pitch, speed, effects) after TTS generation but before delivery. Includes an example FFmpeg pitch-modulation plugin demonstrating the pattern.
## Motivation
Users want to customize TTS voice characteristics beyond what providers offer:
- Deeper/higher pitch for personality customization
- Speed adjustments
- Custom audio effects (reverb, EQ, etc.)
Current workaround requires manual post-processing or forking TTS code. This PR makes it a first-class config feature.
## Changes
### Core TTS (`src/tts/tts.ts`)
- Added `applyPostProcessing()` helper function
- Calls post-processing hook after TTS generation (both Edge and API providers)
- Graceful fallback to original audio on failure
- Timeout protection (default 5s, configurable 100ms-30s)
### Config Types (`src/config/types.tts.ts`)
- Added `postProcess` field to `TtsConfig`:
- `enabled?: boolean` — Enable/disable post-processing
- `command?: string` — Path to processing script (supports `~` expansion)
- `timeoutMs?: number` — Timeout in milliseconds
- `env?: Record<string, string>` — Environment variables for the command
### Zod Schema (`src/config/zod-schema.core.ts`)
- Added validation for `messages.tts.postProcess` config block
### Example Plugin (`extensions/tts-ffmpeg-pitch/`)
- **Plugin manifest**: `openclaw.plugin.json` with config schema
- **CLI command**: `openclaw tts-pitch` for testing transformations
- **Processing script**: `bin/process-audio.sh` (FFmpeg pitch/speed modulation)
- **Documentation**: Full README with examples and troubleshooting
### Tests (`src/tts/tts-post-process.test.ts`)
- Post-processing disabled (skip when `enabled: false`)
- Post-processing with no command (skip when command missing)
- Passthrough processing (cat/cp commands)
- Fallback on failure (non-zero exit, missing output)
- Environment variable passing
- Timeout handling
### Documentation
- `docs/tts-post-processing.md` — Comprehensive guide with examples
- `extensions/tts-ffmpeg-pitch/README.md` — Plugin usage and config
## Example Usage
### Deeper Voice (TARS-style)
```json
{
"messages": {
"tts": {
"provider": "openai",
"postProcess": {
"enabled": true,
"command": "~/.openclaw/extensions/tts-ffmpeg-pitch/bin/process-audio.sh",
"timeoutMs": 8000,
"env": {
"FFMPEG_PITCH": "0.82"
}
}
}
}
}
```
### Higher, Faster Voice
```json
{
"messages": {
"tts": {
"provider": "openai",
"postProcess": {
"enabled": true,
"command": "~/.openclaw/extensions/tts-ffmpeg-pitch/bin/process-audio.sh",
"env": {
"FFMPEG_PITCH": "1.2",
"FFMPEG_SPEED": "1.15"
}
}
}
}
}
```
## Command Interface
Processing commands receive:
- **`OPENCLAW_TTS_INPUT`**: Path to original TTS audio file
- **`OPENCLAW_TTS_OUTPUT`**: Path where processed audio should be written
- **Custom env vars**: Any variables from `postProcess.env`
Commands must:
- Write processed audio to ``
- Exit with code `0` on success
- Exit with non-zero on failure (triggers fallback to original)
## Error Handling
All failures are **fail-safe**:
- Command not found → original audio
- Non-zero exit → original audio
- Timeout → kill process, original audio
- Missing output file → original audio
Failures logged via `logVerbose()` for debugging.
## Breaking Changes
None. Feature is opt-in and disabled by default.
## Testing
Run the full gate:
```bash
pnpm build && pnpm check && pnpm test
```
Run post-processing tests:
```bash
pnpm test src/tts/tts-post-process.test.ts
```
## AI Attribution
**AI-assisted (Claude Sonnet 4.5)** — Plan and implementation reviewed and tested by human.
Design session: [planning transcript available on request]
## Checklist
- [x] Config types updated (`src/config/types.tts.ts`)
- [x] Zod schema validation added (`src/config/zod-schema.core.ts`)
- [x] Core TTS hook implemented (`src/tts/tts.ts`)
- [x] Example plugin created (`extensions/tts-ffmpeg-pitch/`)
- [x] Tests written (`src/tts/tts-post-process.test.ts`)
- [x] Documentation added (`docs/tts-post-processing.md`, plugin README)
- [x] Full gate passed (`pnpm build && pnpm check && pnpm test`)
- [x] Tested manually with FFmpeg plugin
## Follow-ups (optional)
- [ ] Add per-provider post-processing config
- [ ] Add per-message post-processing directives
- [ ] Add plugin API for registered transforms (beyond command-based)
- [ ] Support telephony TTS post-processing (buffer-based)
## Related Issues
Closes #9044
Most Similar PRs
#11704: feat(tts): OpenAI TTS baseUrl support for local servers (Chatterbox...
by mateusz-michalik · 2026-02-08
70.1%
#19073: feat(voice-call): streaming TTS, barge-in, silence filler, hangup, ...
by odrobnik · 2026-02-17
69.3%
#19210: feat(tts): add OpenAI instructions parameter support
by keenranger · 2026-02-17
68.8%
#7965: feat(tts): add Speechify as TTS provider
by chaerla · 2026-02-03
68.6%
#22086: fix(tts): honor explicit config provider and model/voice settings
by AIflow-Labs · 2026-02-20
67.8%
#23572: feat(voice): enable voice note conversation loop for Telegram and W...
by davidrudduck · 2026-02-22
67.8%
#20992: fix(tts): apply TTS processing to agentCommand outbound delivery path
by mmyyfirstb · 2026-02-19
67.7%
#20794: feat(tts): add Fish Audio provider with full docs, tests & gateway ...
by twangodev · 2026-02-19
67.6%
#16089: fix(tts): clarify directive syntax in prompts and strip malformed tags
by kmixter · 2026-02-14
67.3%
#11745: ui: add server-side TTS for web chat via gateway endpoint
by wjlgatech · 2026-02-08
66.9%