#21049: fix(failover): treat HTTP 5xx as rate-limit for model fallback
agents
size: XS
## Summary
Treat HTTP 502/503/504 as failover-eligible (rate_limit reason) so configured model fallbacks trigger when the primary provider is overloaded or temporarily unavailable.
## Changes
- Added handling for status codes 502, 503, 504 in `resolveFailoverReasonFromError()`
- Treats these as `rate_limit` failures to enable existing fallback/cooldown behavior
## Fixes
Closes #20999
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Adds handling for HTTP 502 (Bad Gateway), 503 (Service Unavailable), and 504 (Gateway Timeout) status codes in `resolveFailoverReasonFromError()`, treating them as `rate_limit` failures to enable model fallback when the primary provider is overloaded or temporarily unavailable.
- Maps 502/503/504 errors to `rate_limit` reason, which triggers the existing failover/cooldown behavior in `runWithModelFallback()`
- Aligns with existing `isTransientHttpError()` logic in `pi-embedded-helpers/errors.ts` which treats 500, 502, 503 and Cloudflare 5xx codes as transient failures (mapped to timeout)
- Simple, focused change that enables configured model fallbacks to activate on server-side failures
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk
- The change is a straightforward, logical extension to error classification that properly handles server-side errors. It maps 502/503/504 status codes to rate_limit reason, which is semantically appropriate for temporary unavailability. The code follows existing patterns, has clear comments, and integrates seamlessly with the existing failover infrastructure. The implementation is simple and doesn't introduce new dependencies or complex logic.
- No files require special attention
<sub>Last reviewed commit: b09e1b0</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#21017: fix: treat HTTP 502/503/504 as failover-eligible (timeout reason)
by taw0002 · 2026-02-19
88.0%
#9427: fix: trigger model fallback on all 4xx HTTP errors
by dbottme · 2026-02-05
84.7%
#12314: fix: treat HTTP 5xx server errors as failover-worthy
by hsssgdtc · 2026-02-09
83.6%
#15815: Fallback LLM doesn't trigger if primary model is local
by shihanqu · 2026-02-13
81.8%
#22064: fix(failover): bypass models allowlist for configured fallback models
by winston-bepresent · 2026-02-20
81.3%
#19252: fix(agents): continue model fallback on failover text payloads
by mahsumaktas · 2026-02-17
80.2%
#21152: fix(agents): throw FailoverError for unknown model so fallback chai...
by Mellowambience · 2026-02-19
79.9%
#21033: fix(failover): classify connection errors as timeout for model fail...
by zerone0x · 2026-02-19
78.5%
#13658: fix: silent model failover with fallback notification
by taw0002 · 2026-02-10
77.7%
#19077: fix(agents): trigger model failover on connection-refused and netwo...
by ayanesakura · 2026-02-17
77.7%