#5031: fix: add network connection error codes to failover classifier
agents
## Summary
Fix LLM provider failover not triggering on ECONNREFUSED and other network connection errors.
## Problem
When the primary LLM provider is unreachable (e.g., Ollama server is stopped), users see connection errors instead of automatic failover to backup providers. The failover classifier did not recognize common network connectivity error codes like `ECONNREFUSED`, `ENOTFOUND`, `ENETUNREACH`, and `EHOSTUNREACH`.
## Solution
Add these network connection error codes to the failover classifier in `resolveFailoverReasonFromError()`. These errors are classified as `timeout` reason (provider unreachable), which triggers failover to the next configured provider.
## Changes
- **`src/agents/failover-error.ts`**: Added `ECONNREFUSED`, `ENOTFOUND`, `ENETUNREACH`, `EHOSTUNREACH` to the list of error codes that return `timeout` failover reason
- **`src/agents/failover-error.test.ts`**: Added test for network connection error codes
## Testing
- All existing tests pass
- Added new test case: "infers timeout from network connection error codes"
- Type checking passes
Closes #4921
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR extends the failover classifier (`resolveFailoverReasonFromError`) to treat common Node/network connection failures (`ECONNREFUSED`, `ENOTFOUND`, `ENETUNREACH`, `EHOSTUNREACH`) as a `timeout` failover reason, aligning them with existing timeout-ish codes like `ETIMEDOUT`/`ECONNRESET`. It also adds a unit test to ensure these codes trigger the expected failover behavior, improving provider failover when the primary endpoint is unreachable (e.g., local Ollama down).
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk.
- Change is a small, targeted extension of an existing error-code allowlist plus a corresponding unit test; it doesn’t alter control flow outside the classifier and uses the same normalization (`toUpperCase`) as existing codes.
- No files require special attention
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#19077: fix(agents): trigger model failover on connection-refused and netwo...
by ayanesakura · 2026-02-17
88.0%
#21033: fix(failover): classify connection errors as timeout for model fail...
by zerone0x · 2026-02-19
86.6%
#15815: Fallback LLM doesn't trigger if primary model is local
by shihanqu · 2026-02-13
85.0%
#15163: fix(errors): classify connection errors as retryable failover reason
by fagemx · 2026-02-13
82.8%
#21516: fix: classify connection errors as timeout for model failover (#20931)
by echoVic · 2026-02-20
82.7%
#12314: fix: treat HTTP 5xx server errors as failover-worthy
by hsssgdtc · 2026-02-09
81.6%
#7229: fix: add network error resilience to agentic loop failover
by ai-fanatic · 2026-02-02
80.6%
#22359: fix(agents): classify overloaded service errors as timeout
by AIflow-Labs · 2026-02-21
80.4%
#21017: fix: treat HTTP 502/503/504 as failover-eligible (timeout reason)
by taw0002 · 2026-02-19
80.0%
#9427: fix: trigger model fallback on all 4xx HTTP errors
by dbottme · 2026-02-05
79.8%