#12717: fix: add "audio" to openai provider capabilities
stale
Cluster:
Media Handling Improvements
## Summary
The openai media-understanding provider implements `transcribeAudio` via `transcribeOpenAiCompatibleAudio` (Whisper API), but its `capabilities` array only declared `["image"]`.
This caused the media-understanding runner to skip the openai provider when processing inbound audio messages (e.g., voice messages on Discord/WhatsApp), resulting in raw audio files being passed directly to agents instead of transcribed text.
## Fix
Add `"audio"` to the openai provider's capabilities array so the runner correctly selects the openai provider for audio transcription when configured with `tools.media.audio`.
## Test
Before fix:
```
[tools] image failed: Unsupported media type: audio
```
Agent received raw `.ogg` file path instead of transcribed text.
After fix:
Audio messages are transcribed via Whisper API before reaching the agent.
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR updates the OpenAI media-understanding provider (`src/media-understanding/providers/openai/index.ts`) to declare support for audio by adding `"audio"` to its `capabilities` list. This aligns the provider’s declared capabilities with its existing `transcribeAudio` implementation, allowing the media-understanding runner to select the OpenAI provider for inbound audio messages when `tools.media.audio` is enabled, so audio is transcribed before being passed to agents.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk.
- Single-line change that makes declared capabilities match already-exported functionality (`transcribeAudio`). No behavioral change beyond provider selection logic for audio, and it should unblock intended transcription flow.
- No files require special attention
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#8388: fix(media): auto-skip tiny/empty audio files before transcription (...
by Glucksberg · 2026-02-04
79.5%
#11334: feat: add Mistral/Voxtral audio transcription provider
by JamesEBall · 2026-02-07
76.8%
#8048: Media: add regression test for audio text blocks (#7970)
by Abhishek-B-R · 2026-02-03
75.4%
#14208: feat(media): add AssemblyAI audio transcription provider
by jmoraispk · 2026-02-11
74.7%
#15197: fix: allow OpenAI auth profiles for OpenAI-compatible providers
by bufordtjustice2918 · 2026-02-13
74.6%
#14794: fix: parse inline MEDIA: tokens in agent replies
by explainanalyze · 2026-02-12
74.3%
#19427: feat: add Soniox speech-to-text provider
by matjaz · 2026-02-17
73.6%
#8848: feat(stt): Add Whisper as first-class audio transcription provider
by emadomedher · 2026-02-04
73.2%
#9177: feat(media): add parakeet-mlx CLI output support
by mac-110 · 2026-02-04
73.1%
#5499: fix(voice-call): wait for session creation before sending config up...
by lailoo · 2026-01-31
73.1%