#23210: fix: avoid cooldown on timeout/unknown failovers
agents
size: XS
Cluster:
Rate Limit Management Enhancements
## Summary
- only mark auth profile failures for provider-confirmed failover reasons (exclude `timeout` and `unknown`)
- keep failover behavior intact so retries and model fallback still happen for timeout/unknown paths
- prevent timeout/unknown paths from poisoning shared provider cooldown state across channels
## Test plan
- [x] `pnpm vitest run src/agents/pi-embedded-helpers.classifyfailoverreason.test.ts src/agents/pi-embedded-runner.run-embedded-pi-agent.auth-profile-rotation.test.ts` (validated in local workspace before isolated commit)
- [ ] validate on VPS runtime by reproducing timeout in one channel and confirming no cooldownUntil is written for timeout/unknown
Made with [Cursor](https://cursor.com)
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Prevents timeout and unknown failover reasons from triggering auth profile cooldowns while preserving failover/retry behavior. The PR addresses a type mismatch where `"unknown"` was being passed to `markAuthProfileFailure`, but `AuthProfileFailureReason` only accepts `"auth" | "format" | "rate_limit" | "billing" | "timeout"` (not `"unknown"`).
Key changes:
- Added `"unknown"` exclusion alongside existing `"timeout"` exclusion when marking profile failures (lines 514-519, 604-616)
- Only provider-confirmed failure reasons (`"auth"`, `"rate_limit"`, `"billing"`, `"format"`) now trigger cooldowns
- Timeout/unknown errors still trigger failover and model fallback behavior, but don't poison shared provider cooldown state across channels
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk
- The changes fix a type correctness issue and align the code with the existing test suite. The logic is straightforward - it adds `"unknown"` to the existing exclusion list for timeout, preventing invalid values from being passed to `markAuthProfileFailure`. The PR maintains backward compatibility by preserving failover behavior while only preventing cooldown escalation for ambiguous failure reasons.
- No files require special attention
<sub>Last reviewed commit: a16fc90</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#20946: fix: skip auth cooldown on timeout (not an auth failure)
by austenstone · 2026-02-19
84.5%
#14914: fix: resolve actual failure reason for cooldown-skipped providers
by mcaxtr · 2026-02-12
83.1%
#19267: fix: derive failover reason from timedOut flag to prevent unknown c...
by austenstone · 2026-02-17
83.0%
#14824: fix: do not trigger provider cooldown on LLM request timeouts
by CyberSinister · 2026-02-12
82.7%
#14368: fix: skip auth profile cooldown on format errors to prevent provide...
by koatora20 · 2026-02-12
79.9%
#14574: fix: gentler rate-limit cooldown backoff + clear stale cooldowns on...
by JamesEBall · 2026-02-12
78.9%
#18902: fix: exempt format errors from auth profile cooldown
by tag-assistant · 2026-02-17
78.7%
#10178: fix: trigger fallback when model resolution fails with unknown model
by Yida-Dev · 2026-02-06
77.9%
#4462: fix: prevent gateway crash when all auth profiles are in cooldown
by garnetlyx · 2026-01-30
77.9%
#22359: fix(agents): classify overloaded service errors as timeout
by AIflow-Labs · 2026-02-21
77.5%