#18587: fix(ollama): improve timeout handling and cooldown logic for local providers

by manthis open 2026-02-16 21:33 View on GitHub →

agents size: XS

# Fix Ollama Integration: Timeouts and Cooldown Logic ## Summary This PR addresses multiple critical issues in the OpenClaw-Ollama integration that cause silent failures, excessive timeouts, and inappropriate cooldown behavior for local model providers. ## Issues Fixed - **Silent failures** after switching to Ollama models via `/model` commands - **5-second timeout** too short for Ollama model discovery during cold starts - **1-hour cooldowns** applied to local providers that don't have rate limits - **Poor error visibility** making debugging difficult ## Root Causes Identified 1. **Discovery timeout**: Hardcoded 5s timeout insufficient for cold model loading (15-30s typical) 2. **Cooldown logic**: Same exponential backoff used for both cloud APIs and local providers 3. **Error handling**: Failures silently propagated without detailed logging ## Changes Made ### 1. Extended Ollama Discovery Timeout - **File**: `src/agents/models-config.providers.ts` - **Change**: Increase timeout from 5s → 30s for `/api/tags` discovery - **Rationale**: Cold start model loading commonly takes 15-30s ### 2. Local Provider Cooldown Differentiation - **File**: `src/agents/auth-profiles/usage.ts` - **Changes**: - Add `isLocalProvider()` detection for Ollama, vLLM, LocalAI, etc. - Implement shorter cooldown progression for local providers: - **Local**: 30s → 1m → 2m → 4m → 5m (max) - **Cloud**: 5m → 25m → 125m → 1h (unchanged) ### 3. Enhanced Error Logging - **File**: `src/agents/ollama-stream.ts` - **Changes**: - Log request URL, model context, and request body on failures - Improve debugging visibility for integration issues ## Testing ✅ **Direct Ollama API**: Works consistently (0.5-12s response times) ❌ **OpenClaw → Ollama**: Intermittent failures, timeouts, silent drops ✅ **Post-fix**: Local providers recover faster from transient issues ## Validation - **Environment**: macOS arm64, 16GB RAM, Ollama v0.15.5+ - **Models tested**: `tinyllama:1.1b`, `llama3.1:8b` - **Channels affected**: Telegram, webchat, CLI (all channels route through same integration layer) - **Reproduction**: Switch to local model → send message → observe timeout/silence ## Related Issues - Fixes #18576 (created by reporter) - Related to #14380 (qwen3:8b connection errors) - Related to #13336 (incorrect cooldown for local providers) - Related to #7791 (no response with local Ollama models) ## Impact - **Improves reliability** of local model switching via `/model` commands - **Reduces user frustration** from mysterious 1-hour lockouts - **Better debugging experience** through detailed error logs - **No breaking changes** to existing cloud provider behavior --- **Note**: This fix addresses the integration layer between OpenClaw and Ollama. Direct Ollama usage remains unaffected. The changes are conservative and maintain backward compatibility while providing better behavior for local infrastructure.  <h3>Greptile Summary</h3> This PR fixes three critical issues affecting Ollama and other local model provider integrations: insufficient discovery timeouts, inappropriate cooldown behavior, and poor error visibility. The changes appropriately differentiate local providers from cloud APIs by implementing shorter cooldown progressions (30s → 1m → 2m → 4m → 5m max) instead of the cloud exponential backoff (5m → 25m → 125m → 1h). The discovery timeout is increased from 5s to 30s to accommodate cold model loading, and enhanced error logging provides better debugging context. The implementation correctly addresses the scope issue with `body` that was previously identified, and the cooldown math properly produces the documented progression. <h3>Confidence Score: 4/5</h3> - This PR is safe to merge with minimal risk - the changes are well-scoped to local provider behavior and maintain backward compatibility with cloud providers. - Score reflects that the changes are conservative, targeted, and address real user-facing issues without breaking existing functionality. The local provider detection is explicit and well-documented. The timeout increase is reasonable for cold starts. The previously identified scope bug with `body` has been correctly fixed. Minor deduction due to the hardcoded list of local providers which could become stale, but this is acceptable given the targeted nature of the fix. - No files require special attention - all changes are straightforward and correctly implemented. <sub>Last reviewed commit: c7ed081</sub>