← Back to PRs

#22402: Add runtime.stt.transcribeAudioFile for plugin STT access

by benthecarman open 2026-02-21 03:53 View on GitHub →
channel: bluebubbles size: S
## Summary - Add `runtime.stt.transcribeAudioFile()` to `PluginRuntime` so external plugins can use openclaw's media-understanding provider framework for speech-to-text - New `src/media-understanding/transcribe-audio.ts` wraps `runCapability({capability: "audio"})` — same pattern as the Discord VC implementation in #18774 - Reads provider/model/apiKey from `tools.media.audio` in the config, with automatic provider fallback ## Motivation The marmot plugin needs to transcribe call audio chunks but can't import internal media-understanding modules (`ERR_PACKAGE_PATH_NOT_EXPORTED`). This mirrors how `runtime.tts.textToSpeechTelephony` already exposes TTS to plugins. ## Usage (from a plugin) ```typescript const result = await runtime.stt.transcribeAudioFile({ filePath: "/tmp/audio-chunk.wav", cfg: runtime.config.loadConfig(), }); if (result.text) { // dispatch transcript to agent } ``` ## Test plan - [ ] TypeScript compiles - [ ] Existing media-understanding tests still pass - [ ] Marmot plugin can call `runtime.stt.transcribeAudioFile()` after openclaw is rebuilt 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- greptile_comment --> <h3>Greptile Summary</h3> Adds `runtime.stt.transcribeAudioFile()` to expose speech-to-text functionality to external plugins. The implementation follows the same pattern as the Discord voice manager's `transcribeAudio()` function, wrapping `runCapability({capability: "audio"})` from the media-understanding framework. Key changes: - New `src/media-understanding/transcribe-audio.ts` provides a standalone wrapper function - Function exported via `PluginRuntime.stt.transcribeAudioFile` - Uses same provider/model/apiKey resolution from `tools.media.audio` config - Properly handles cleanup via `cache.cleanup()` in finally block The implementation is clean and matches established patterns in the codebase. <h3>Confidence Score: 5/5</h3> - Safe to merge - straightforward implementation following existing patterns - The implementation directly mirrors the proven Discord voice manager pattern, properly handles resource cleanup, and uses the existing media-understanding provider framework without introducing new dependencies or risks. The only minor suggestion is around MIME type flexibility. - No files require special attention <sub>Last reviewed commit: 70009ce</sub> <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs