#20878: fix: Widen models.input to accept "video" and "audio" modalities
commands
agents
size: XS
## Summary
Fixes #20721
The `models.input` type union only accepted `"text"` and `"image"`, causing zod validation to reject `"video"` and `"audio"` values. This blocked users from declaring Gemini native multimodal capabilities through the config system, even though the runtime already has infrastructure for video/audio processing (`MAX_VIDEO_BYTES`, `MAX_AUDIO_BYTES`, `MediaKind`).
## Root Cause
`ModelDefinitionSchema` in `zod-schema.core.ts` restricts the input array to:
```typescript
z.array(z.union([z.literal("text"), z.literal("image")]))
```
This same `"text" | "image"` constraint was duplicated across 9 files (types, model catalog, provider discovery, etc.).
## Changes (9 files, +22/-16)
**Config layer (user-facing):**
- `src/config/zod-schema.core.ts` — widened zod union to include `"video"` and `"audio"`
- `src/config/types.models.ts` — widened `ModelDefinitionConfig.input` type
**Provider discovery:**
- `src/agents/bedrock-discovery.ts` — `mapInputModalities()` now passes through `"video"` and `"audio"` from Bedrock model summaries
- `src/agents/model-catalog.ts` — widened `ModelCatalogEntry` and `DiscoveredModel` input types
- `src/agents/cloudflare-ai-gateway.ts` — widened param type
- `src/agents/huggingface-models.ts` — widened local variable type
- `src/commands/onboard-auth.config-litellm.ts` — widened return type
**SDK boundary:**
- `src/agents/model-scan.ts` — `parseModality()` kept narrow (`"text" | "image"`) for SDK `Model<Api>` compatibility; added comment explaining the constraint
- `src/agents/pi-embedded-runner/model.test.ts` — loosened test helper type
## Backward Compatibility
Fully backward compatible:
- Existing configs without `"video"` / `"audio"` are unaffected
- All downstream consumers only check `.includes("image")` — extra values pass through harmlessly
- Display strings (e.g., `list.registry.ts:206`) use `.join("+")` which handles any value
## Test Plan
- [x] 444 existing tests pass (1 pre-existing Windows symlink failure unrelated)
- [x] TypeScript type check passes (no new errors)
- [x] Zero runtime logic changes needed — consumers are agnostic to extra input values
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Extends `models.input` type union from `"text" | "image"` to include `"video"` and `"audio"` modalities, enabling users to declare Gemini and other multimodal capabilities through the config system. The change is applied consistently across the type system (Zod schema, TypeScript types, provider discovery, model catalog).
- Core schema and types updated in `zod-schema.core.ts` and `types.models.ts`
- Provider discovery updated to pass through video/audio from Bedrock model summaries
- SDK boundary (`model-scan.ts`) intentionally kept narrow for `Model<Api>` compatibility, with clear documentation
- Fully backward compatible: existing configs unaffected, consumers only check `.includes("image")`, display uses `.join("+")` which handles any value
- One issue found: HuggingFace discovery widened the type but didn't update the logic to detect video/audio from API responses
<h3>Confidence Score: 4/5</h3>
- Safe to merge with one fix: HuggingFace discovery logic needs updating
- The PR is well-structured and backward compatible, with consistent type changes across 9 files. The approach is sound: widening the config layer to accept video/audio while keeping the SDK boundary narrow. However, one logical error was found in `huggingface-models.ts` where the type was widened but the detection logic wasn't updated to actually extract video/audio modalities from API responses. This is a straightforward fix that won't affect existing behavior.
- Pay attention to `src/agents/huggingface-models.ts` - the logic needs updating to match the widened type
<sub>Last reviewed commit: f8cb9f8</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
---
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Testing: fully tested (automated + manual verification)
I understand and can explain all changes in this PR.
Most Similar PRs
#20738: Fix model input schema to accept audio and video modalities
by Clawborn · 2026-02-19
90.0%
#21298: fix(config): extend model input schema for video/audio modalities
by Alfa-ai-ccvs-tech · 2026-02-19
89.7%
#21499: fix #20721: add video and audio to models.input type union
by neipor · 2026-02-20
88.0%
#20867: fix: allow 'video' and 'audio' in models.input config
by pierreeurope · 2026-02-19
87.8%
#10943: fix(config): resolve Control UI "Unsupported schema node" for confi...
by kraftbj · 2026-02-07
77.4%
#23211: fix: include modelByChannel in allowed channels validator
by westerbamos · 2026-02-22
76.0%
#16290: fix: add field-level validation for custom LLM provider config
by superlowburn · 2026-02-14
75.8%
#14640: feat(agents): support per-agent temperature and maxTokens in agents...
by lailoo · 2026-02-12
75.6%
#19020: bugfix(gateway): Handle invalid model provider API config gracefully\…
by funkyjonx · 2026-02-17
75.4%
#19326: Agents: improve z.ai GLM-5 integration and failover
by gabrielespinheira · 2026-02-17
74.7%