#16274: feat(voice): Fix persistent speech errors, silent playback, and feedb…

by ryotsukuda333 open 2026-02-14 15:23 View on GitHub →

app: android size: M

Cluster: Text-to-Speech Provider Enhancements

## Summary - **Problem:** Talk Mode on Android suffers from multiple critical issues: Speech Error 11 infinite loops, silent playback on certain devices (e.g., Pixel 10), feedback loops where the mic picks up the AI's voice, and silent conversation responses due to client-server clock skew. - **Why it matters:** These bugs make Talk Mode completely unusable on affected devices. - **What changed:** Fixed SpeechRecognizer error handling, switched audio to USAGE_MEDIA/MP3, added clock skew buffer, destroyed recognizer during playback, and added user-configurable ElevenLabs settings. - **What did NOT change (scope boundary):** No changes to text chat, gateway protocol, canvas, or any non-voice functionality. ## Change Type (select all) - [x] Bug fix - [x] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [x] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Related: Talk Mode voice output failures on Android ## User-visible / Behavior Changes - Talk Mode now uses MP3 format and `USAGE_MEDIA` audio attributes (previously PCM/`USAGE_ASSISTANT`) - New Settings fields: "ElevenLabs API Key" and "Voice ID" for custom voice configuration - "Test Voice" button appears when an API key is configured - SpeechRecognizer is fully destroyed during AI playback (mic is off while AI speaks) ## Security Impact (required) - New permissions/capabilities? No - Secrets/tokens handling changed? Yes — ElevenLabs API key is stored in EncryptedSharedPreferences (same mechanism as gateway token) - New/changed network calls? No (ElevenLabs API calls already existed) - Command/tool execution surface changed? No - Data access scope changed? No - If any Yes, explain risk + mitigation: API key is stored using Android's EncryptedSharedPreferences, which uses AES-256 encryption backed by the Android Keystore. Same security model as the existing gateway token storage. ## Repro + Verification ### Environment - OS: Android 16 (Pixel 10) - Runtime/container: N/A (native Android app) - Model/provider: ElevenLabs (eleven_multilingual_v2) ## Evidence - Before: Talk Mode enters "Speech error 11" loop, audio is silent, mic picks up AI voice causing infinite conversation - After: Talk Mode works reliably — recognizer resets on severe errors, audio plays correctly, mic is inactive during playback ## Human Verification (required) - Verified scenarios: Talk Mode conversation (multiple turns), Test Voice button, custom Voice ID, error recovery - Edge cases checked: Repeated conversations, rapid start/stop, API key not set (graceful fallback) - What you did not verify: Other Android versions (only Pixel 10 / Android 16 tested) ## Compatibility / Migration - Backward compatible? Yes - Config/env changes? No - Migration needed? No ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: Revert this commit - Files/config to restore: None - Known bad symptoms reviewers should watch for: Silent audio playback, SpeechRecognizer not restarting after errors ## Risks and Mitigations - Risk: `USAGE_MEDIA` may behave differently on some OEM Android skins Mitigation: MP3 format is universally supported; USAGE_MEDIA is the most common audio usage type - Risk: 5-second clock skew buffer could theoretically surface a stale message Mitigation: Messages are filtered by recency (reversed iteration), so the most recent assistant message is always returned first  <h3>Greptile Summary</h3> This PR fixes several critical Talk Mode issues on Android: SpeechRecognizer Error 11 infinite loops, silent audio playback, feedback loops from mic picking up AI speech, and clock skew causing missed responses. It also adds user-configurable ElevenLabs API key and Voice ID settings stored in EncryptedSharedPreferences. - Switched default audio output to MP3 format with `USAGE_MEDIA` audio attributes to fix silent playback on affected devices (e.g., Pixel 10) - Added a 5-second clock skew buffer when filtering chat history by timestamp, preventing valid assistant responses from being silently discarded - Recognizer is now fully destroyed during AI playback to prevent mic-picks-up-speaker feedback loops - SpeechRecognizer errors 11, 4 (SERVER), and 3 (CLIENT) now trigger a full recognizer recreation instead of a simple restart, breaking the Error 11 infinite loop - New Settings fields for ElevenLabs API Key and Voice ID, with a "Test Voice" button for verification - When an ElevenLabs API key is configured, TTS failures no longer fall back to system TTS (which is broken on affected devices), instead surfacing the error directly - **Issue found**: The PCM audio fallback path still uses `USAGE_ASSISTANT` (line 686) while the MP3 path was updated to `USAGE_MEDIA` — this inconsistency could re-trigger the silent playback bug if PCM output is configured via the gateway <h3>Confidence Score: 3/5</h3> - Generally safe to merge after fixing the PCM USAGE_ASSISTANT inconsistency, which could cause the same silent playback bug in the fallback path. - The core fixes for Error 11, feedback loops, and clock skew are well-implemented. However, the PCM audio path still uses USAGE_ASSISTANT (the exact attribute that caused the original silent playback bug), creating an incomplete fix. The ElevenLabs settings integration is clean and follows existing patterns. No security concerns — API key storage uses EncryptedSharedPreferences correctly. - `apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt` — PCM playback path at line 686 still uses `USAGE_ASSISTANT` while the MP3 path was updated to `USAGE_MEDIA` <sub>Last reviewed commit: 712346a</sub>