#10351: feat: Add Mumble voice chat extension

by emadomedher open 2026-02-06 10:56 View on GitHub →

# Add Mumble Voice Chat Extension ## Summary This PR adds a new **Mumble voice chat extension** that enables full voice conversations with OpenClaw agents via [Mumble VoIP](https://www.mumble.info/). ## Features ### Core Functionality - **Full voice conversation loop**: Users can speak naturally to OpenClaw agents on Mumble - **Automatic speech processing**: 500ms silence detection triggers transcription and response - **High-quality audio**: 128kbps Opus encoding with "audio" mode (optimized for TTS, not "voip") - **Low latency**: 10ms audio frames (480 samples at 48kHz) ### Proactive Speaking - **HTTP endpoint**: `POST /mumble/speak` for programmatic voice messages - **Voice parameter support**: Override default voice per request - **Cron job integration**: Schedule voice announcements/reminders ### Multi-Bot Support - **Sender allowlist**: Restrict which Mumble users can trigger the bot - **Multiple bots in one channel**: Configure different allowlists for each bot instance ## Architecture ``` User speaks (PTT) → Opus packets ↓ Decode (opus-decoder WASM) → PCM ↓ Silence detection (500ms timeout) ↓ Convert to WAV → Whisper STT ↓ Text → OpenClaw Agent (/v1/chat/completions API) ↓ Response → Kokoro TTS (24kHz WAV) ↓ Resample to 48kHz → Chunk (10ms frames) ↓ Encode (@discordjs/opus, 128kbps) → Mumble ``` ## Dependencies ### External Services (Required) - **Mumble server** (tested with v1.5.857) - **Whisper STT** - OpenAI-compatible API (e.g., http://localhost:8200/v1) - **Kokoro TTS** - OpenAI-compatible API (e.g., http://localhost:8102/v1) - Supports 67 voices across 8 languages - Voice blending with `+` syntax (e.g., `af_nova+jf_alpha`) ### NPM Dependencies - **[@tf2pickup-org/mumble-client](https://github.com/tf2pickup-org/mumble-client)** - Mumble protocol client - **Note**: Requires [PR #982](https://github.com/tf2pickup-org/mumble-client/pull/982) for full audio support - Until merged, users should use fork: `file:../../../code/mumble-client-fork` or GitHub install - **[@discordjs/opus](https://github.com/discordjs/opus)** - Native Opus codec bindings - **Native dependency**: Requires Node.js v24+ and build tools - **node-fetch** - HTTP client (standard dependency) ### Installation Notes - Extension includes `package.json` with all dependencies - Users must run `npm install` in extension directory - Native module compilation requires system build tools (gcc, make, python3) ## Configuration Example ```json5 { "plugins": { "entries": { "mumble": { "enabled": true, "config": { "mumble": { "host": "192.168.1.128", "port": 64738, "username": "OpenClaw-Bot" }, "audio": { "whisperUrl": "http://localhost:8200/v1", "kokoroUrl": "http://localhost:8102/v1", "kokoroVoice": "af_nova+jf_alpha" }, "processing": { "silenceTimeoutMs": 500, "allowFrom": ["username1", "username2"] // Optional }, "gateway": { "url": "http://localhost:18789", "token": "your-gateway-token" } } } } } } ``` ## Usage Examples ### Interactive Voice Chat ```bash # User joins Mumble and speaks # Bot transcribes → sends to agent → speaks response ``` ### Proactive Speaking ```bash # Default voice curl -X POST http://localhost:18789/mumble/speak \ -H "Content-Type: application/json" \ -d '{"text": "Reminder: Meeting in 10 minutes"}' # Custom voice curl -X POST http://localhost:18789/mumble/speak \ -H "Content-Type: application/json" \ -d '{"text": "Weather update", "voice": "af_nicole"}' ``` ### Scheduled Announcements ```bash # Use OpenClaw cron to schedule voice messages openclaw cron add --at "2026-02-06T14:00:00Z" --isolated \ --task "Get weather and POST to /mumble/speak endpoint" ``` ## Technical Details ### Audio Processing - **Codec**: Opus (type 4 only) - **Frame size**: 480 samples (10ms at 48kHz) - **Bitrate**: 128kbps - **Application mode**: "audio" (TTS/music quality, not "voip") - **Silence detection**: Timeout-based (500ms after last packet) - Does NOT rely on terminator packets (unreliable across clients) ### Voice Pipeline - **STT**: Whisper (OpenAI-compatible `/v1/audio/transcriptions`) - **TTS**: Kokoro-82M (OpenAI-compatible `/v1/audio/speech`) - Returns 24kHz WAV - Linear interpolation resampling to 48kHz - **Text sanitization**: Removes markdown, emojis, formatting for natural TTS ### Plugin API Usage - Uses `api.registerHttpHandler()` for HTTP endpoint - Uses `api.registerService()` for lifecycle management - Proper TypeScript with OpenClaw's `moduleResolution: NodeNext` ## Testing Tested with: - Mumble server v1.5.857 (latest stable) - Node.js v24.13.0 - OpenClaw v2026.2.4 - Multiple simultaneous users - Different Kokoro voice blends - Sender allowlist functionality - Proactive speaking via HTTP ## Documentation - Comprehensive `README.md` with: - Installation guide - Configuration reference - Usage examples - Troubleshooting section - Architecture overview - Plugin manifest with JSON Schema validation - UI hints for Control UI integration ## Breaking Changes None. This is a new optional extension. ## Related Work - Mumble client fork PR: https://github.com/tf2pickup-org/mumble-client/pull/982 - Adds `sendAudio()` method and full audio packet parsing - Maintains backward compatibility ## Future Enhancements (Not in this PR) - Voice Activity Detection (VAD) for hands-free mode - Multi-channel support (switch channels dynamically) - Recording/playback of conversations - Integration with OpenClaw's built-in TTS providers (Edge TTS, etc.) - WebRTC for browser-based voice chat ## Questions for Reviewers 1. **Dependency strategy**: Should we bundle the Mumble client fork or wait for upstream PR merge? 2. **Native modules**: Any concerns about @discordjs/opus requirement? 3. **Documentation**: Is README.md sufficient or should we add to main docs? 4. **Naming**: Is "mumble" the right plugin ID or should it be more descriptive? --- **Ready for review!** This extension has been running in production for several hours with multiple users and voice configurations.  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> - Adds a new `extensions/mumble` plugin that connects to a Mumble server, listens for Opus audio, transcribes via Whisper-compatible STT, and speaks responses via Kokoro-compatible TTS. - Implements an audio pipeline (Opus decode/encode, WAV wrapping, basic silence detection, 24k→48k resampling, 10ms framing) and a `VoiceChatClient` event loop. - Exposes a new HTTP endpoint `POST /mumble/speak` to trigger proactive speech through the bot. - Introduces extension-local npm/tsconfig/manifest files to build and configure the plugin independently. <h3>Confidence Score: 2/5</h3> - This PR has a few merge-blocking correctness/security issues that should be addressed first. - The core feature is coherent, but it currently depends on a non-reproducible local-path dependency, disables TLS verification for Mumble by default, exposes an unauthenticated speak endpoint, and has a reachable accumulator-reset bug that can cause incorrect STT behavior and memory growth. - extensions/mumble/package.json, extensions/mumble/src/index.ts, extensions/mumble/src/voice-chat-client.ts