#16274: feat(voice): Fix persistent speech errors, silent playback, and feedb…
app: android
size: M
Cluster:
Text-to-Speech Provider Enhancements
## Summary
- **Problem:** Talk Mode on Android suffers from multiple critical issues: Speech Error 11 infinite loops, silent playback on certain devices (e.g., Pixel 10), feedback loops where the mic picks up the AI's voice, and silent conversation responses due to client-server clock skew.
- **Why it matters:** These bugs make Talk Mode completely unusable on affected devices.
- **What changed:** Fixed SpeechRecognizer error handling, switched audio to USAGE_MEDIA/MP3, added clock skew buffer, destroyed recognizer during playback, and added user-configurable ElevenLabs settings.
- **What did NOT change (scope boundary):** No changes to text chat, gateway protocol, canvas, or any non-voice functionality.
## Change Type (select all)
- [x] Bug fix
- [x] Feature
- [ ] Refactor
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [ ] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [ ] Memory / storage
- [ ] Integrations
- [ ] API / contracts
- [x] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Related: Talk Mode voice output failures on Android
## User-visible / Behavior Changes
- Talk Mode now uses MP3 format and `USAGE_MEDIA` audio attributes (previously PCM/`USAGE_ASSISTANT`)
- New Settings fields: "ElevenLabs API Key" and "Voice ID" for custom voice configuration
- "Test Voice" button appears when an API key is configured
- SpeechRecognizer is fully destroyed during AI playback (mic is off while AI speaks)
## Security Impact (required)
- New permissions/capabilities? No
- Secrets/tokens handling changed? Yes — ElevenLabs API key is stored in EncryptedSharedPreferences (same mechanism as gateway token)
- New/changed network calls? No (ElevenLabs API calls already existed)
- Command/tool execution surface changed? No
- Data access scope changed? No
- If any Yes, explain risk + mitigation: API key is stored using Android's EncryptedSharedPreferences, which uses AES-256 encryption backed by the Android Keystore. Same security model as the existing gateway token storage.
## Repro + Verification
### Environment
- OS: Android 16 (Pixel 10)
- Runtime/container: N/A (native Android app)
- Model/provider: ElevenLabs (eleven_multilingual_v2)
## Evidence
- Before: Talk Mode enters "Speech error 11" loop, audio is silent, mic picks up AI voice causing infinite conversation
- After: Talk Mode works reliably — recognizer resets on severe errors, audio plays correctly, mic is inactive during playback
## Human Verification (required)
- Verified scenarios: Talk Mode conversation (multiple turns), Test Voice button, custom Voice ID, error recovery
- Edge cases checked: Repeated conversations, rapid start/stop, API key not set (graceful fallback)
- What you did not verify: Other Android versions (only Pixel 10 / Android 16 tested)
## Compatibility / Migration
- Backward compatible? Yes
- Config/env changes? No
- Migration needed? No
## Failure Recovery (if this breaks)
- How to disable/revert this change quickly: Revert this commit
- Files/config to restore: None
- Known bad symptoms reviewers should watch for: Silent audio playback, SpeechRecognizer not restarting after errors
## Risks and Mitigations
- Risk: `USAGE_MEDIA` may behave differently on some OEM Android skins
Mitigation: MP3 format is universally supported; USAGE_MEDIA is the most common audio usage type
- Risk: 5-second clock skew buffer could theoretically surface a stale message
Mitigation: Messages are filtered by recency (reversed iteration), so the most recent assistant message is always returned first
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR fixes several critical Talk Mode issues on Android: SpeechRecognizer Error 11 infinite loops, silent audio playback, feedback loops from mic picking up AI speech, and clock skew causing missed responses. It also adds user-configurable ElevenLabs API key and Voice ID settings stored in EncryptedSharedPreferences.
- Switched default audio output to MP3 format with `USAGE_MEDIA` audio attributes to fix silent playback on affected devices (e.g., Pixel 10)
- Added a 5-second clock skew buffer when filtering chat history by timestamp, preventing valid assistant responses from being silently discarded
- Recognizer is now fully destroyed during AI playback to prevent mic-picks-up-speaker feedback loops
- SpeechRecognizer errors 11, 4 (SERVER), and 3 (CLIENT) now trigger a full recognizer recreation instead of a simple restart, breaking the Error 11 infinite loop
- New Settings fields for ElevenLabs API Key and Voice ID, with a "Test Voice" button for verification
- When an ElevenLabs API key is configured, TTS failures no longer fall back to system TTS (which is broken on affected devices), instead surfacing the error directly
- **Issue found**: The PCM audio fallback path still uses `USAGE_ASSISTANT` (line 686) while the MP3 path was updated to `USAGE_MEDIA` — this inconsistency could re-trigger the silent playback bug if PCM output is configured via the gateway
<h3>Confidence Score: 3/5</h3>
- Generally safe to merge after fixing the PCM USAGE_ASSISTANT inconsistency, which could cause the same silent playback bug in the fallback path.
- The core fixes for Error 11, feedback loops, and clock skew are well-implemented. However, the PCM audio path still uses USAGE_ASSISTANT (the exact attribute that caused the original silent playback bug), creating an incomplete fix. The ElevenLabs settings integration is clean and follows existing patterns. No security concerns — API key storage uses EncryptedSharedPreferences correctly.
- `apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt` — PCM playback path at line 686 still uses `USAGE_ASSISTANT` while the MP3 path was updated to `USAGE_MEDIA`
<sub>Last reviewed commit: 712346a</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#20475: fix(macos): resolve 120%+ CPU regression and gateway stability
by teknomage8 · 2026-02-19
77.7%
#9456: feat(mac): add enhanced Siri neural voice support for Talk mode
by teknomage8 · 2026-02-05
76.8%
#23778: feat: chat UI facelift — speech, themes, config categories, and polish
by BunsDev · 2026-02-22
75.1%
#19073: feat(voice-call): streaming TTS, barge-in, silence filler, hangup, ...
by odrobnik · 2026-02-17
73.7%
#22889: feat(talk): add provider-agnostic talk config contract
by ngutman · 2026-02-21
73.5%
#21110: fix(tts): deliver audio via structured mediaUrl instead of MEDIA: t...
by hydro13 · 2026-02-19
73.3%
#18852: fix: Voice-call state persistence is fire-and-forget, causing silen...
by coygeek · 2026-02-17
71.8%
#21193: fix(tts): send voice messages as Opus bubbles on Telegram
by aris-katkova · 2026-02-19
71.4%
#8922: feat(voice-call): Add ElevenLabs WebSocket streaming TTS
by mikiships · 2026-02-04
71.3%
#12700: fix(tts): deliver WhatsApp voice as opus bubble instead of MP3 (#12...
by lailoo · 2026-02-09
71.1%