← Back to PRs

#11704: feat(tts): OpenAI TTS baseUrl support for local servers (Chatterbox, Coqui, LocalAI)

by mateusz-michalik open 2026-02-08 06:07 View on GitHub →
size: M
## Summary - Cherry-pick TTS `baseUrl` commit from #9736 (`00719bc`, thanks @divol89) and fix bugs identified in review - Make local OpenAI-compatible TTS servers (Chatterbox, Coqui, LocalAI) work end-to-end without an OpenAI API key - Fix media parser rejecting TTS tool `/tmp/` audio paths, which blocked voice delivery on all channels Closes #9709 Ref: #9736 ## What PR #9736 already did (commit `00719bc`) - Added `baseUrl?: string` to `TtsConfig.openai` in `types.tts.ts` - Added `baseUrl` to Zod schema in `zod-schema.core.ts` - Resolved `baseUrl` in `resolveTtsConfig()` in `tts.ts` - Passed `baseUrl` to `openaiTTS()` function - Used config `baseUrl` with fallback to `getOpenAITtsBaseUrl()` (env var) ## Bugs fixed on top of the cherry-pick 1. **Missing type field** — `baseUrl` was not added to `ResolvedTtsConfig` type, causing type errors 2. **No URL normalization** — config `baseUrl` with trailing slash produced double-slash URLs (`http://host:4123/v1//audio/speech`). Now stripped via `.replace(/\/+$/, "")` 3. **`isCustomOpenAIEndpoint()` was env-only** — only checked `OPENAI_TTS_BASE_URL` env var, not the new config field. Model/voice validation wouldn't relax for config-based custom URLs. Now accepts optional `configBaseUrl` param, threaded through `isValidOpenAIModel()` and `isValidOpenAIVoice()` 4. **API key still required** — `resolveTtsApiKey()` returned `undefined` for openai without key, so the provider was skipped. Now returns `"local"` sentinel when `baseUrl` is set. `openaiTTS()` omits the `Authorization` header for the sentinel 5. **`isTtsProviderConfigured()` required API key** — openai with custom `baseUrl` but no key showed as unconfigured. Now treats openai as configured when `baseUrl` is set 6. **`baseUrl` not passed in `textToSpeech()` main path** — only passed in `textToSpeechTelephony()`. Added to the main provider loop 7. **Media parser rejected TTS audio paths** — `isValidMedia()` only accepted `./` relative paths and `https://` URLs. The TTS tool writes audio to `/tmp/tts-*/voice-*` and returns `MEDIA:/tmp/...`, which was rejected and sent as raw text instead of a voice attachment. Now allows `/tmp/` paths (no traversal). Other absolute paths remain blocked for LFI safety ## Tests added ### TTS tests (13 new in `tts.test.ts`) - `resolveTtsConfig` resolves and trims `openai.baseUrl` - `isCustomOpenAIEndpoint` returns true for config baseUrl and env var - `resolveTtsApiKey` returns `"local"` sentinel when custom baseUrl set with no key; prefers real keys; returns undefined without either - `isTtsProviderConfigured` returns true for openai with baseUrl, false without key or baseUrl - `isValidOpenAIModel/Voice` relaxes validation when config baseUrl is set ### Media parser tests (2 new in `parse.test.ts`) - Accepts `/tmp/` paths from internal tools (e.g. TTS) - Rejects `/tmp/` paths with directory traversal ## Usage ```yaml messages: tts: provider: openai openai: baseUrl: "http://localhost:8880/v1" model: "chatterbox" voice: "default" ``` No `apiKey` needed for local servers. ## Validation - `pnpm build` — passes - `pnpm check` — lint/format/types pass - `pnpm test -- --run src/tts/tts.test.ts` — 48 tests pass (13 new) - `pnpm test -- --run src/media/parse.test.ts` — 11 tests pass (2 new) ## Test plan - [ ] `pnpm build && pnpm check` passes - [ ] `pnpm test` passes (all TTS and media parser tests) - [ ] Manual: set `messages.tts.openai.baseUrl` to a local Chatterbox instance, verify voice generation works without an OpenAI API key - [ ] Verify Telegram voice notes arrive as voice bubbles (opus format) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Most Similar PRs