#22402: Add runtime.stt.transcribeAudioFile for plugin STT access
channel: bluebubbles
size: S
Cluster:
Voice Transcription Enhancements
## Summary
- Add `runtime.stt.transcribeAudioFile()` to `PluginRuntime` so external plugins can use openclaw's media-understanding provider framework for speech-to-text
- New `src/media-understanding/transcribe-audio.ts` wraps `runCapability({capability: "audio"})` — same pattern as the Discord VC implementation in #18774
- Reads provider/model/apiKey from `tools.media.audio` in the config, with automatic provider fallback
## Motivation
The marmot plugin needs to transcribe call audio chunks but can't import internal media-understanding modules (`ERR_PACKAGE_PATH_NOT_EXPORTED`). This mirrors how `runtime.tts.textToSpeechTelephony` already exposes TTS to plugins.
## Usage (from a plugin)
```typescript
const result = await runtime.stt.transcribeAudioFile({
filePath: "/tmp/audio-chunk.wav",
cfg: runtime.config.loadConfig(),
});
if (result.text) {
// dispatch transcript to agent
}
```
## Test plan
- [ ] TypeScript compiles
- [ ] Existing media-understanding tests still pass
- [ ] Marmot plugin can call `runtime.stt.transcribeAudioFile()` after openclaw is rebuilt
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Adds `runtime.stt.transcribeAudioFile()` to expose speech-to-text functionality to external plugins. The implementation follows the same pattern as the Discord voice manager's `transcribeAudio()` function, wrapping `runCapability({capability: "audio"})` from the media-understanding framework.
Key changes:
- New `src/media-understanding/transcribe-audio.ts` provides a standalone wrapper function
- Function exported via `PluginRuntime.stt.transcribeAudioFile`
- Uses same provider/model/apiKey resolution from `tools.media.audio` config
- Properly handles cleanup via `cache.cleanup()` in finally block
The implementation is clean and matches established patterns in the codebase.
<h3>Confidence Score: 5/5</h3>
- Safe to merge - straightforward implementation following existing patterns
- The implementation directly mirrors the proven Discord voice manager pattern, properly handles resource cleanup, and uses the existing media-understanding provider framework without introducing new dependencies or risks. The only minor suggestion is around MIME type flexibility.
- No files require special attention
<sub>Last reviewed commit: 70009ce</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#14208: feat(media): add AssemblyAI audio transcription provider
by jmoraispk · 2026-02-11
70.2%
#22735: feat(plugin): add feishu-media extension
by cintia09 · 2026-02-21
70.1%
#19073: feat(voice-call): streaming TTS, barge-in, silence filler, hangup, ...
by odrobnik · 2026-02-17
69.6%
#10351: feat: Add Mumble voice chat extension
by emadomedher · 2026-02-06
69.6%
#18911: feat(plugins): Add registerStreamFnWrapper and updatePluginConfig APIs
by John-Rood · 2026-02-17
69.1%
#19427: feat: add Soniox speech-to-text provider
by matjaz · 2026-02-17
69.0%
#12597: voice-call: add Asterisk ARI provider + core STT
by w0s1nsk1 · 2026-02-09
68.6%
#16044: plugin-sdk: expose onAgentEvent + onSessionTranscriptUpdate via Plu...
by scifantastic · 2026-02-14
68.4%
#23572: feat(voice): enable voice note conversation loop for Telegram and W...
by davidrudduck · 2026-02-22
68.0%
#20155: feat(telegram): add tg-network-guard transcript status + reply flow
by artemgetmann · 2026-02-18
67.8%