#19077: fix(agents): trigger model failover on connection-refused and network-unreachable errors
agents
size: S
## Summary
Closes #18868
Network connection errors (`ECONNREFUSED`, `ENETUNREACH`, `EHOSTUNREACH`, `ENETRESET`, `EAI_AGAIN`) were not recognized as failover-worthy errors, so the fallback chain was never advanced when the primary provider was unreachable due to a network outage or a stopped local server.
This is especially impactful when a **local fallback model** (e.g. Ollama on localhost) is configured alongside a remote primary: if the network goes down, the gateway should seamlessly fall back to the local model instead of surfacing an error to the user.
### Changes
- **`src/agents/failover-error.ts`** — Added `ECONNREFUSED`, `ENETUNREACH`, `EHOSTUNREACH`, `ENETRESET`, and `EAI_AGAIN` to the error-code list in `resolveFailoverReasonFromError()` that triggers failover (categorized as `"timeout"`, consistent with the existing `ETIMEDOUT` / `ECONNRESET` / `ECONNABORTED` handling).
- **`src/agents/model-fallback.e2e.test.ts`** — Added 4 e2e tests covering each new error code scenario.
### Why these specific codes
| Code | When it fires |
|------|--------------|
| `ECONNREFUSED` | Target port not listening (e.g. Ollama/vLLM stopped, or remote server down) |
| `ENETUNREACH` | No route to network (e.g. Wi-Fi/Ethernet disconnected) |
| `EHOSTUNREACH` | Specific host unreachable (e.g. VPN down, firewall block) |
| `ENETRESET` | Connection reset by network (e.g. NAT timeout, ISP reset) |
| `EAI_AGAIN` | DNS resolution temporarily failed (e.g. DNS server unreachable) |
## Test plan
- [x] All 4 new e2e tests pass (`pnpm test:e2e -- src/agents/model-fallback.e2e.test.ts`)
- [x] Existing failover tests unaffected (the one pre-existing flaky test `skips providers when all profiles are in cooldown` fails on `main` as well)
- [x] `oxfmt` and `oxlint` pass on changed files
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Adds network connection error codes to the failover mechanism so the agent can gracefully fall back to alternative models when the primary provider is unreachable due to network issues.
- Extended the error code list in `resolveFailoverReasonFromError()` to include `ECONNREFUSED`, `ENETUNREACH`, `EHOSTUNREACH`, `ENETRESET`, and `EAI_AGAIN`, categorized as timeout errors
- Added 4 e2e tests covering the new error scenarios (missing test for `ENETRESET`)
- Consistent with existing network error handling in `src/telegram/network-errors.ts`
- Enables seamless failover to local models (e.g., Ollama) when remote providers are down
<h3>Confidence Score: 4/5</h3>
- Safe to merge with one test case addition recommended
- The change is well-implemented and consistent with existing patterns in the codebase. The new error codes are already recognized elsewhere (telegram network errors), and the categorization as timeout errors makes sense. Test coverage is good but missing one case for ENETRESET. The impact is localized to failover logic with clear benefits for resilience.
- Add test case for `ENETRESET` in `model-fallback.e2e.test.ts` for complete coverage
<sub>Last reviewed commit: 43c1c9a</sub>
<!-- greptile_other_comments_section -->
<sub>(5/5) You can turn off certain types of comments like style [here](https://app.greptile.com/review/github)!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#5031: fix: add network connection error codes to failover classifier
by shayan919293 · 2026-01-30
88.0%
#15815: Fallback LLM doesn't trigger if primary model is local
by shihanqu · 2026-02-13
85.9%
#21033: fix(failover): classify connection errors as timeout for model fail...
by zerone0x · 2026-02-19
85.4%
#7229: fix: add network error resilience to agentic loop failover
by ai-fanatic · 2026-02-02
83.0%
#21152: fix(agents): throw FailoverError for unknown model so fallback chai...
by Mellowambience · 2026-02-19
82.7%
#22064: fix(failover): bypass models allowlist for configured fallback models
by winston-bepresent · 2026-02-20
82.1%
#19252: fix(agents): continue model fallback on failover text payloads
by mahsumaktas · 2026-02-17
81.8%
#9427: fix: trigger model fallback on all 4xx HTTP errors
by dbottme · 2026-02-05
81.6%
#21516: fix: classify connection errors as timeout for model failover (#20931)
by echoVic · 2026-02-20
80.6%
#12314: fix: treat HTTP 5xx server errors as failover-worthy
by hsssgdtc · 2026-02-09
80.5%