#20738: Fix model input schema to accept audio and video modalities
size: S
trusted-contributor
## Problem
`ModelDefinitionSchema.input` was limited to `["text", "image"]` in both the Zod schema and TypeScript type, so any config declaring `"audio"` or `"video"` as model inputs fails validation:
```
Invalid input at models.providers.google.models.0.input.2
— expected "text" or "image"
```
This blocks users from declaring native multimodal capabilities for providers like Gemini that support audio/video input.
## Fix
Extend the union in `ModelDefinitionSchema.input` to include `"audio"` and `"video"`, and update the matching TypeScript type in `types.models.ts`.
The runtime already has full support for these modalities:
- `MAX_VIDEO_BYTES` / `MAX_AUDIO_BYTES` constants
- `MediaUnderstandingCapabilitiesSchema` (line 405 in same file) already accepts `["image", "audio", "video"]`
- `media-understanding/providers/google` declares `capabilities: ["image", "audio", "video"]`
## Tests
3 new test cases in `config-misc.test.ts` verifying text/image, audio/video acceptance, and rejection of unknown modalities.
Fixes #20721
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Extends `ModelDefinitionSchema.input` to accept `"audio"` and `"video"` modalities in addition to the existing `"text"` and `"image"` values. The change unblocks users from declaring native multimodal capabilities for providers like Gemini that support audio/video input.
- Updated Zod schema in `zod-schema.core.ts:41-44` to include audio and video literals in the union
- Updated TypeScript type in `types.models.ts:31` to match the schema
- Added comprehensive test coverage with 3 test cases validating text/image acceptance, audio/video acceptance, and rejection of unknown modalities
The runtime already has full support for these modalities (`MAX_AUDIO_BYTES`, `MAX_VIDEO_BYTES` constants, `MediaUnderstandingCapabilitiesSchema` accepting all four modalities, and Google provider declaring `["image", "audio", "video"]` capabilities). The fix aligns the config schema with existing runtime capabilities.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with no risk - it's a simple schema extension that aligns with existing runtime capabilities.
- The change is minimal (two lines updated across schema and types), well-tested (3 new test cases), and directly addresses a validation bug. The runtime already fully supports audio/video modalities through existing constants, providers, and capabilities schemas. No breaking changes or edge cases identified.
- No files require special attention
<sub>Last reviewed commit: bb9b6d0</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#20878: fix: Widen models.input to accept "video" and "audio" modalities
by marcodelpin · 2026-02-19
90.0%
#20867: fix: allow 'video' and 'audio' in models.input config
by pierreeurope · 2026-02-19
89.4%
#21499: fix #20721: add video and audio to models.input type union
by neipor · 2026-02-20
89.1%
#21298: fix(config): extend model input schema for video/audio modalities
by Alfa-ai-ccvs-tech · 2026-02-19
84.5%
#23211: fix: include modelByChannel in allowed channels validator
by westerbamos · 2026-02-22
73.3%
#10943: fix(config): resolve Control UI "Unsupported schema node" for confi...
by kraftbj · 2026-02-07
73.1%
#22998: fix(config): add modelByChannel to allowed channels keys
by bbekdemir · 2026-02-21
72.7%
#23155: fix: add modelByChannel to allowed channel config keys
by tiagocampo · 2026-02-22
72.2%
#14640: feat(agents): support per-agent temperature and maxTokens in agents...
by lailoo · 2026-02-12
71.8%
#16290: fix: add field-level validation for custom LLM provider config
by superlowburn · 2026-02-14
71.6%