← Back to PRs

#23210: fix: avoid cooldown on timeout/unknown failovers

by nydamon open 2026-02-22 03:42 View on GitHub →
agents size: XS
## Summary - only mark auth profile failures for provider-confirmed failover reasons (exclude `timeout` and `unknown`) - keep failover behavior intact so retries and model fallback still happen for timeout/unknown paths - prevent timeout/unknown paths from poisoning shared provider cooldown state across channels ## Test plan - [x] `pnpm vitest run src/agents/pi-embedded-helpers.classifyfailoverreason.test.ts src/agents/pi-embedded-runner.run-embedded-pi-agent.auth-profile-rotation.test.ts` (validated in local workspace before isolated commit) - [ ] validate on VPS runtime by reproducing timeout in one channel and confirming no cooldownUntil is written for timeout/unknown Made with [Cursor](https://cursor.com) <!-- greptile_comment --> <h3>Greptile Summary</h3> Prevents timeout and unknown failover reasons from triggering auth profile cooldowns while preserving failover/retry behavior. The PR addresses a type mismatch where `"unknown"` was being passed to `markAuthProfileFailure`, but `AuthProfileFailureReason` only accepts `"auth" | "format" | "rate_limit" | "billing" | "timeout"` (not `"unknown"`). Key changes: - Added `"unknown"` exclusion alongside existing `"timeout"` exclusion when marking profile failures (lines 514-519, 604-616) - Only provider-confirmed failure reasons (`"auth"`, `"rate_limit"`, `"billing"`, `"format"`) now trigger cooldowns - Timeout/unknown errors still trigger failover and model fallback behavior, but don't poison shared provider cooldown state across channels <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk - The changes fix a type correctness issue and align the code with the existing test suite. The logic is straightforward - it adds `"unknown"` to the existing exclusion list for timeout, preventing invalid values from being passed to `markAuthProfileFailure`. The PR maintains backward compatibility by preserving failover behavior while only preventing cooldown escalation for ambiguous failure reasons. - No files require special attention <sub>Last reviewed commit: a16fc90</sub> <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs