#8317: fix(tts): add dynamic timeout and retry logic for ElevenLabs TTS

by camtang26 open 2026-02-03 21:53 View on GitHub →

stale

Cluster: Voice Call and TTS Improvements

## Summary Addresses audio cutoff issues when generating TTS for longer text via ElevenLabs API. - **Dynamic timeout scaling**: `MIN_TIMEOUT_MS (15s) + text_length × 30ms`, capped at `MAX_TIMEOUT_MS (120s)` - **Retry logic**: Exponential backoff (3 attempts) for transient failures (timeouts, 5xx errors, network issues) - **Smart retry skipping**: Doesn't retry auth/validation errors (401, 403, 422, invalid headers) - **Audio buffer validation**: Checks minimum size (1KB) and validates MP3/OGG magic headers before returning - **Enhanced diagnostic logging**: Logs latency, buffer size, and retry attempts for debugging ## Problem ElevenLabs TTS can take 45-60+ seconds for longer text (1500+ chars), but the previous fixed 30-second timeout caused requests to abort mid-generation. This resulted in: - Truncated/silent audio files - No retry for transient network failures - No validation that audio was complete ## Test plan - [x] Unit tests for `calculateTtsTimeout()` - 4 test cases - [x] Unit tests for `validateAudioBuffer()` - 6 test cases - [x] All 43 TTS tests pass - [ ] Manual testing with long text TTS generation 🤖 Generated with [Claude Code](https://claude.com/claude-code)  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR improves ElevenLabs TTS robustness by adding a text-length-based timeout calculation, wrapping the ElevenLabs fetch in `retryAsync` with exponential backoff, and validating returned audio buffers (minimum size + basic MP3/OGG header checks). Unit tests were extended to cover the new timeout calculation and buffer validation helpers. In the existing `textToSpeech` flow, ElevenLabs requests now dynamically scale their abort timeout to better handle long synthesis jobs, and retries aim to recover from transient failures while skipping common non-retriable auth/validation cases. <h3>Confidence Score: 4/5</h3> - This PR is reasonably safe to merge and should improve ElevenLabs TTS reliability, with a couple edge cases worth tightening. - Core change is localized to ElevenLabs request handling and is covered by new unit tests. Main concern is the buffer validation’s dependence on `outputFormat` being a non-empty string, which could throw or skip validation in edge cases and would interact poorly with retries/logging. - src/tts/tts.ts  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>