#9703: feat(macos): Voice settings restructure + Whisper transcription support

by nsd97 open 2026-02-05 15:45 View on GitHub →

app: macos stale

Cluster: Voice Transcription Enhancements

## Summary Restructures the Voice settings UI and adds Whisper transcription support for both Push-to-Talk and Voice Wake. ## Changes ### UI Restructure - Renamed tab from "Voice Wake" → "Voice" - Reorganized settings into 5 clear sections: - **Voice Wake** - Wake word detection (always uses Apple Speech) - **Push-to-Talk** - Hotkey-triggered transcription - **Transcription Model** - Combined picker for Apple Speech + Whisper models - **Audio** - Input device selection - **Sounds** - Audio feedback toggles ### Whisper Integration - Fixed binary detection: `whisper-cpp` → `whisper-cli` (Homebrew renamed it) - Added combined model picker showing Apple Speech and all Whisper model sizes - Implemented rolling audio buffer for Voice Wake → Whisper handoff - Push-to-Talk now supports Whisper transcription via sox/rec ### Voice Wake Whisper Handoff - Apple Speech handles wake word detection (efficient for always-on) - After wake phrase detected, audio buffer is sent to Whisper for command transcription - Maintains ~10s rolling buffer so pre-wake audio isn't lost ## Testing - [x] App builds and signs - [x] Voice settings UI renders correctly - [x] Whisper model detection works - [x] Wake word matching logic verified with unit test  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR restructures the macOS Voice settings UI (renaming the tab to “Voice” and splitting settings into Voice Wake / Push-to-Talk / Transcription Model / Audio / Sounds). It also adds local Whisper support by introducing a `WhisperTranscriber` actor, new persisted state for transcription backend + model, and a `RollingAudioBuffer` used to hand off buffered post-wake audio to Whisper for command transcription. <h3>Confidence Score: 3/5</h3> - This PR is close, but has a few concrete runtime/UX issues around Whisper execution and user guidance. - Main changes are straightforward UI + new Whisper plumbing, but multiple code paths hardcode Homebrew binary locations and the availability checks/messages are inconsistent, which can cause Whisper to be reported as available yet fail at runtime or mislead users on setup. - apps/macos/Sources/OpenClaw/WhisperTranscriber.swift; apps/macos/Sources/OpenClaw/VoicePushToTalk.swift; apps/macos/Sources/OpenClaw/VoiceWakeSettings.swift; apps/macos/Sources/OpenClaw/MenuContentView.swift