← Back to PRs

#11877: feat(ollama): auto-detect vision capability via /api/show

by Nina-VanKhan open 2026-02-08 13:25 View on GitHub →
agents stale
## Summary For Ollama models not explicitly configured in `openclaw.json`, the `resolveModel()` function falls back with `input: ["text"]` - no image support. This causes pi-ai to filter out all image content before sending to Ollama, even when the underlying model supports vision. This PR adds automatic vision capability detection by querying Ollama's `/api/show` endpoint. ## Changes - Added `queryOllamaVisionCapability()` helper that queries Ollama's `/api/show` endpoint to check if a model has `"vision"` in its capabilities - Made `resolveModel()` async to support the API query - Updated all callers (`run.ts`, `compact.ts`, `tts.ts`) to await the function - Updated tests to be async ## How It Works When falling back for Ollama models not in the explicit config: 1. Query `http://ollama-host/api/show` with the model name 2. Check if `capabilities` array includes `"vision"` 3. If yes, set `input: ["text", "image"]` instead of just `["text"]` ## Test Plan 1. Configure Ollama provider in `openclaw.json` without listing a vision-capable model 2. Use `/model ollama/llava` (or another vision model) 3. Send an image via the dashboard chat 4. Verify the model receives and processes the image 🤖 Generated with [Claude Code](https://claude.ai/code) <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR makes `resolveModel()` asynchronous so it can query Ollama’s `/api/show` endpoint when falling back to an unconfigured Ollama model, enabling automatic detection of `vision` capability (and therefore allowing `input: ["text", "image"]` in the fallback model). Call sites in the embedded runner (`run.ts`, `compact.ts`) and TTS summarization (`tts.ts`) were updated to `await` the model resolution, and unit tests were adjusted accordingly. Main concern is the new outbound HTTP call for capability detection: it currently uses raw `fetch()` rather than the project’s guarded fetch utilities/SSRF policy layer. If the Ollama `baseUrl` is configurable in environments where configs are untrusted, this could allow requests to arbitrary/private hosts. <h3>Confidence Score: 3/5</h3> - This PR is likely mergeable after addressing a security-relevant fetch/SSRF policy gap in the new Ollama capability probe. - Core logic changes are localized and call sites were updated to await the new async `resolveModel()`. However, the new `/api/show` probe uses raw `fetch()` and may bypass existing SSRF/network policy protections applied elsewhere to configurable base URLs, which is important to fix before merging. - src/agents/pi-embedded-runner/model.ts <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs