#13389: feat(telegram): support native voice notes with automatic OGG/Opus transcoding

by leavingme open 2026-02-10 13:01 View on GitHub →

channel: discord channel: telegram stale size: M

🤖 AI-Assisted: Built with OpenClaw + Gemini 3.0 Flash - Testing: Locally tested with Telegram voice messages (ffmpeg transcoding verified) - Human Review: Code reviewed and verified by @leavingme --- ## English This PR adds support for Telegram native voice notes (waveforms). ### Key Changes 1. **New Configuration**: Added `nativeVoiceNotes` option for Telegram accounts 2. **Automatic Transcoding**: OGG/Opus transcoding via `ffmpeg` when enabled 3. **Status Check**: Warns if `nativeVoiceNotes` is enabled but `ffmpeg` is missing 4. **Cross-Provider**: Works with Edge, OpenAI, and ElevenLabs TTS providers ### Why Telegram treats audio files and voice notes differently. Standard TTS output (MP3/WebM) shows as generic audio cards. This PR enables native voice-note experience with waveforms, speed control, and transcription support. --- ## 中文说明为 Telegram 添加原生语音消息（带波形）支持。 ### 主要变更 1. **新增配置**: Telegram 账户新增 `nativeVoiceNotes` 选项 2. **自动转码**: 启用时通过 `ffmpeg` 自动转码为 OGG/Opus 格式 3. **状态检测**: 若开启 `nativeVoiceNotes` 但系统缺少 `ffmpeg`，会提示用户安装 4. **全提供商支持**: 兼容 Edge、OpenAI、ElevenLabs 等所有 TTS 提供商 ### 背景 Telegram 区分音频文件和语音消息。普通 TTS 输出（MP3/WebM）显示为通用音频卡片。本 PR 实现原生语音消息体验，支持波形显示、倍速播放和转录功能。  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adds native voice note support for Telegram by automatically transcoding TTS output to OGG/Opus format via `ffmpeg` when the new `nativeVoiceNotes` config option is enabled. **Key implementation details:** - New config option: `channels.telegram.nativeVoiceNotes` (boolean) - New module `src/tts/compat.ts` calls `transcodeToOggOpus()` to convert audio files - Transcoding uses `execFileSync` with proper argv array (safe from shell injection) - Graceful fallback: if `ffmpeg` missing, sends standard audio without transcoding - Status check: warns users if `nativeVoiceNotes` enabled but `ffmpeg` not installed - Works across all TTS providers (Edge, OpenAI, ElevenLabs) **Additional changes:** - Extracted `summarizeText` from `tts-core.ts` into separate `src/tts/summarize.ts` module (good separation of concerns) - Minor test fix in `threading.test.ts`: added unused variable `_callCount` to track call count in test mock - Exported `hasFFmpeg()` in plugin SDK for extension use The implementation follows the repository's patterns for graceful degradation and status reporting. <h3>Confidence Score: 4/5</h3> - Safe to merge with one minor consideration about variable naming in test - The PR implements a clean feature addition with proper error handling and graceful degradation. The shell injection issue mentioned in previous threads has been properly addressed by using `execFileSync` with argv array. Code follows repository conventions, includes status warnings for missing dependencies, and works across all TTS providers. Minor point deducted for unused test variable with underscore prefix (could use vitest `expect.assertions` instead). - src/discord/monitor/threading.test.ts - minor test code style issue with unused variable <sub>Last reviewed commit: 6d643eb</sub>