← Back to PRs

#12717: fix: add "audio" to openai provider capabilities

by openjay open 2026-02-09 14:51 View on GitHub →
stale
## Summary The openai media-understanding provider implements `transcribeAudio` via `transcribeOpenAiCompatibleAudio` (Whisper API), but its `capabilities` array only declared `["image"]`. This caused the media-understanding runner to skip the openai provider when processing inbound audio messages (e.g., voice messages on Discord/WhatsApp), resulting in raw audio files being passed directly to agents instead of transcribed text. ## Fix Add `"audio"` to the openai provider's capabilities array so the runner correctly selects the openai provider for audio transcription when configured with `tools.media.audio`. ## Test Before fix: ``` [tools] image failed: Unsupported media type: audio ``` Agent received raw `.ogg` file path instead of transcribed text. After fix: Audio messages are transcribed via Whisper API before reaching the agent. <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR updates the OpenAI media-understanding provider (`src/media-understanding/providers/openai/index.ts`) to declare support for audio by adding `"audio"` to its `capabilities` list. This aligns the provider’s declared capabilities with its existing `transcribeAudio` implementation, allowing the media-understanding runner to select the OpenAI provider for inbound audio messages when `tools.media.audio` is enabled, so audio is transcribed before being passed to agents. <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk. - Single-line change that makes declared capabilities match already-exported functionality (`transcribeAudio`). No behavioral change beyond provider selection logic for audio, and it should unblock intended transcription flow. - No files require special attention <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs