#15815: Fallback LLM doesn't trigger if primary model is local
agents
stale
size: XS
## Summary
- classify plain local/network connection failures as failover-eligible timeout errors
- include common local transport phrases (for example, `Connection error.` and `Failed to connect`) in timeout failover patterns
- extend failover tests to cover these connection-failure cases
## Problem
When a primary local model endpoint is down, the embedded runner can emit `Connection error.`. That message was not treated as a failover reason, so model fallback was not triggered.
## Fix
- `src/agents/pi-embedded-helpers/errors.ts`
- expanded timeout failover patterns to include common connection-failure text and transport error codes
- `src/agents/failover-error.ts`
- expanded timeout code mapping to include common network/host resolution failures
- updated tests:
- `src/agents/pi-embedded-helpers.classifyfailoverreason.e2e.test.ts`
- `src/agents/failover-error.e2e.test.ts`
## Validation
- `pnpm vitest run --config vitest.e2e.config.ts src/agents/pi-embedded-helpers.classifyfailoverreason.e2e.test.ts src/agents/failover-error.e2e.test.ts`
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR broadens failover eligibility for local transport failures by:
- Mapping additional common network error codes (e.g., `ECONNREFUSED`, `ENOTFOUND`) to the `timeout` failover reason in `resolveFailoverReasonFromError`.
- Expanding embedded helper timeout text patterns to include connection/transport phrasing and common errno-like tokens.
- Extending the e2e test suites to cover these newly recognized failure shapes.
These changes integrate into the existing model fallback flow via `coerceToFailoverError` → `resolveFailoverReasonFromError` → `classifyFailoverReason`, allowing the fallback runner to treat local endpoint outages as retry/fallback-eligible errors.
<h3>Confidence Score: 5/5</h3>
- This PR looks safe to merge with low risk of regressions.
- Changes are narrowly scoped to failover classification heuristics and are covered by updated e2e tests asserting the new error shapes. I traced the fallback path to confirm these signals are used for model fallback, and the mappings/patterns align with the stated goal (trigger fallback when local endpoints are unreachable).
- src/agents/pi-embedded-helpers/errors.ts (classification patterns), src/agents/failover-error.ts (error-code mapping)
<sub>Last reviewed commit: 2c15d25</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#19077: fix(agents): trigger model failover on connection-refused and netwo...
by ayanesakura · 2026-02-17
85.9%
#5031: fix: add network connection error codes to failover classifier
by shayan919293 · 2026-01-30
85.0%
#21033: fix(failover): classify connection errors as timeout for model fail...
by zerone0x · 2026-02-19
84.2%
#19252: fix(agents): continue model fallback on failover text payloads
by mahsumaktas · 2026-02-17
83.6%
#9427: fix: trigger model fallback on all 4xx HTTP errors
by dbottme · 2026-02-05
83.6%
#22064: fix(failover): bypass models allowlist for configured fallback models
by winston-bepresent · 2026-02-20
83.3%
#12314: fix: treat HTTP 5xx server errors as failover-worthy
by hsssgdtc · 2026-02-09
83.2%
#21152: fix(agents): throw FailoverError for unknown model so fallback chai...
by Mellowambience · 2026-02-19
82.2%
#21049: fix(failover): treat HTTP 5xx as rate-limit for model fallback
by maximalmargin · 2026-02-19
81.8%
#10178: fix: trigger fallback when model resolution fails with unknown model
by Yida-Dev · 2026-02-06
81.8%