← Back to PRs

#8062: feat: add image pre-analysis with imageModel for non-vision models

by mylukin open 2026-02-03 14:38 View on GitHub →
agents size: M
## Why This Feature Matters Many users rely on cost-effective or high-performance models that don't have native vision capabilities. This feature bridges that gap without requiring users to switch to more expensive vision models for every request. ## Summary When `agents.defaults.imageModel` is configured, images in user messages are first analyzed using the configured imageModel, then the text analysis results are passed to the main model. This enables models without native vision capabilities (e.g., MiniMax M2.1, GLM) to understand image content through a vision-capable model (e.g., Gemini Flash, GPT-5). ## How it works **Before (current behavior):** ``` User message (with image) → Main model → Response ↑ (if model supports images, they're passed directly; otherwise images are ignored) ``` **After (with this PR):** ``` User message (with image) → imageModel configured? ├─ Yes → imageModel analyzes image │ ├─ Success → Analysis text + prompt → Main model → Response │ └─ Failed → Fallback to main model (if it supports images) └─ No → Main model handles directly (existing behavior) ``` ## Key Behavior 1. **imageModel takes priority**: When configured, imageModel is always used for image analysis first 2. **Graceful fallback**: If imageModel fails and main model supports images, falls back to passing images directly 3. **Backward compatible**: Without imageModel configured, behavior is unchanged ## Configuration Example ```json { "agents": { "defaults": { "model": { "primary": "minimax/MiniMax-M2.1", "fallbacks": ["anthropic/claude-3-opus"] }, "imageModel": { "primary": "gemini-crs/gemini-3-flash-preview", "fallbacks": ["openai/gpt-4o"] } } } } ``` ## Changes | File | Description | |------|-------------| | `src/agents/pi-embedded-runner/run/image-pre-analysis.ts` | New module with `shouldUseImagePreAnalysis()` and `analyzeImagesWithImageModel()` functions | | `src/agents/pi-embedded-runner/run/image-pre-analysis.test.ts` | Unit tests for the new module (10 tests) | | `src/agents/pi-embedded-runner/run/attempt.ts` | Integrated image pre-analysis into the prompt flow | ## Test Results ``` ✓ src/agents/pi-embedded-runner/run/image-pre-analysis.test.ts (10 tests) 3ms Test Files 1 passed (1) Tests 10 passed (10) ``` ## Manual Testing - [x] Configured `imageModel` with gemini-flash - [x] Configured main `model` with opus (supports images) - [x] Sent image via Feishu - [x] Verified image was analyzed by imageModel first - [x] Verified analysis text was passed to main model --- **Note**: This is a re-submission of #4802 which was automatically closed. All feedback has been addressed and the branch has been rebased onto the latest main.

Most Similar PRs