#21298: fix(config): extend model input schema for video/audio modalities
docs
commands
agents
size: XL
## Summary
- **Fixes gateway startup crash** when `openclaw.json` declares `"video"` or `"audio"` as input modalities (e.g. `gemini-3.1-pro-preview`)
- **Extends Zod validation** from `"text" | "image"` to `"text" | "image" | "video" | "audio"` — purely additive, fully backward-compatible
- **Adds `modelSupportsVideo()` / `modelSupportsAudio()` helpers** and native skip logic in the media-understanding runner (mirrors the existing image skip pattern)
## Root Cause
`models.providers.google.models[1].input = ["text", "image", "video", "audio"]` in `~/.openclaw/openclaw.json` fails Zod validation at `src/config/zod-schema.core.ts:41` which only allows `"text" | "image"`.
## Changes (8 source files + 2 docs)
### Part 1: Extend input type union (7 files)
| File | Change |
|------|--------|
| `src/config/zod-schema.core.ts` | Add `z.literal("video")`, `z.literal("audio")` to union |
| `src/config/types.models.ts` | Widen `ModelDefinitionConfig.input` type |
| `src/agents/model-catalog.ts` | Widen `ModelCatalogEntry.input` and `DiscoveredModel.input` |
| `src/agents/model-scan.ts` | Extend `parseModality()` to detect video/audio |
| `src/agents/huggingface-models.ts` | Detect video/audio in `architecture.input_modalities` |
| `src/commands/onboard-auth.config-litellm.ts` | Widen local type annotation |
| `src/agents/cloudflare-ai-gateway.ts` | Widen parameter type annotation |
### Part 2: Capability helpers (1 file)
- `src/agents/model-catalog.ts` — Added `modelSupportsVideo()` and `modelSupportsAudio()` next to existing `modelSupportsVision()`
### Part 3: Runner skip logic (1 file)
- `src/media-understanding/runner.ts` — When the primary model natively supports video/audio, skip the separate understanding pipeline (same pattern as existing image skip)
### Documentation
- [`docs/MODIFICATION-CARDS-video-audio-input.md`](docs/MODIFICATION-CARDS-video-audio-input.md) — 9 detailed modification cards with before/after code, rationale, risk assessment
- [`docs/TECH-GUIDE-video-audio-input.md`](docs/TECH-GUIDE-video-audio-input.md) — Full architecture overview, change layers, external dependency notes, verification steps
## Upstream Compatibility
Fetched `upstream/main` (`6cdcb5904`, 2026-02-19). Of 8 modified source files, only `runner.ts` was also changed upstream — in different regions (imports + new function at lines 79-101 vs our skip blocks at lines 721-790). **No merge conflicts expected.**
## Verification
- `npx tsc --noEmit` passes with **0 errors** (clean compilation)
- All type widening is additive — existing `["text", "image"]` configs are unaffected
- Default values remain `["text"]` everywhere
## Test plan
- [ ] `npx tsc --noEmit` — no type errors
- [ ] `node openclaw.mjs gateway run --port 18789` — clean startup with Gemini config including video/audio
- [ ] `ss -tlnp | grep :18789` — confirm port is listening
- [ ] `npx vitest run --config vitest.unit.config.ts` — no regressions
- [ ] Verify existing `"text" + "image"` only configs still work unchanged
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR extends input modality validation from `"text" | "image"` to include `"video"` and `"audio"`, fixing gateway crashes when models declare these modalities in config. The changes are backward-compatible and follow existing patterns.
**Key changes:**
- Extends Zod schema and TypeScript types across 7 config/agent files
- Adds `modelSupportsVideo()` and `modelSupportsAudio()` helper functions
- Implements native skip logic in media-understanding runner (mirrors existing image skip pattern)
- Includes comprehensive documentation in `docs/refactor/`
**Observations:**
- Core schema changes are clean and consistent
- Runner skip blocks follow the existing `modelSupportsVision()` pattern correctly
- Type narrowing in `model-scan.ts:483` may silently drop video/audio modalities for OpenRouter models
- Most file additions are custom skills being restored after upstream reset (as noted in PR description)
- Upstream changes in `runner.ts` are in different regions (imports/removed functions vs new skip blocks) - no merge conflicts expected
<h3>Confidence Score: 4/5</h3>
- Safe to merge with minor attention to type narrowing in model scanning
- The PR implements a straightforward additive schema extension following established patterns. All changes are backward-compatible since video/audio are added to an existing union type. The media-understanding skip logic correctly mirrors the existing image handling. One type narrowing cast in `model-scan.ts` could silently drop modalities, but this affects OpenRouter model scanning only and likely needs external dependency updates. No breaking changes or runtime risks in the core functionality.
- Pay attention to `src/agents/model-scan.ts` line 483 where type narrowing may drop video/audio modalities
<sub>Last reviewed commit: 43936fa</sub>
<!-- greptile_other_comments_section -->
<sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#20878: fix: Widen models.input to accept "video" and "audio" modalities
by marcodelpin · 2026-02-19
89.7%
#20738: Fix model input schema to accept audio and video modalities
by Clawborn · 2026-02-19
84.5%
#20867: fix: allow 'video' and 'audio' in models.input config
by pierreeurope · 2026-02-19
83.3%
#21499: fix #20721: add video and audio to models.input type union
by neipor · 2026-02-20
79.8%
#12191: fix: guard against undefined model.input in display and scan layers
by mcaxtr · 2026-02-09
76.9%
#10943: fix(config): resolve Control UI "Unsupported schema node" for confi...
by kraftbj · 2026-02-07
76.6%
#14640: feat(agents): support per-agent temperature and maxTokens in agents...
by lailoo · 2026-02-12
76.3%
#19326: Agents: improve z.ai GLM-5 integration and failover
by gabrielespinheira · 2026-02-17
76.1%
#19020: bugfix(gateway): Handle invalid model provider API config gracefully\…
by funkyjonx · 2026-02-17
76.0%
#6673: fix: preserve allowAny flag in createModelSelectionState for custom...
by tenor0 · 2026-02-01
75.6%