#11371: Auth: cap rate-limit cooldown at 5 minutes; add maxCooldownMinutes config (#11352)
agents
stale
Cluster:
Rate Limit Management Enhancements
## Fix
Auth profile cooldown backoff was too aggressive for transient rate limits — a single 429 could spiral into a 20+ minute lockout via exponential backoff (old cap: 1 hour).
**Root cause:** `calculateAuthProfileCooldownMs` used a 1-hour max cap with `5^n` exponential growth (`1min → 5min → 25min → 60min`). With multiple profiles failing in sequence, accumulated `errorCount` quickly pushed cooldown to the max.
## Changes
- `src/agents/auth-profiles/usage.ts`: Lower default max cooldown from 1h to **5 minutes**; accept optional `maxMs` parameter; pass config-resolved cap through `computeNextProfileUsageStats`
- `src/config/types.auth.ts`: Add `maxCooldownMinutes` to `AuthConfig.cooldowns`
- `src/config/zod-schema.ts`: Add `maxCooldownMinutes` to zod validation
- `src/config/schema.ts`: Add label + description for `auth.cooldowns.maxCooldownMinutes`
- `src/agents/auth-profiles.auth-profile-cooldowns.test.ts`: Update tests for new 5min cap + add test for custom `maxMs`
- `CHANGELOG.md`: Add entry
## New backoff sequence (default)
| errorCount | Before (1h cap) | After (5min cap) |
|---|---|---|
| 1 | 1 min | 1 min |
| 2 | 5 min | 5 min |
| 3 | 25 min | **5 min** |
| 4+ | 60 min | **5 min** |
Users can override via `auth.cooldowns.maxCooldownMinutes` in config.
## Verification
- `pnpm build` passes
- All 5 auth profile cooldown + failure tests pass
- Local reproduction confirms 15 errors → 5min max (was 60min)
Fixes #11352
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR reduces auth profile rate-limit/transient failure cooldown backoff by capping the exponential cooldown at 5 minutes by default (previously effectively up to 1 hour), and introduces a new config knob `auth.cooldowns.maxCooldownMinutes` that is wired through types, zod validation, and config UI hints. The cooldown computation path in `src/agents/auth-profiles/usage.ts` now resolves the cap from config and passes it into `calculateAuthProfileCooldownMs`, and the associated cooldown unit tests and changelog entry were updated accordingly.
<h3>Confidence Score: 4/5</h3>
- This PR is close to safe to merge, with one exported-helper edge case that can break cooldown enforcement if misused.
- The functional change is small and well-covered by updated tests and config schema wiring; the only notable risk is that the newly-added optional `maxMs` parameter is not validated in the exported helper, so incorrect values from future call sites could silently disable cooldowns.
- src/agents/auth-profiles/usage.ts
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#14574: fix: gentler rate-limit cooldown backoff + clear stale cooldowns on...
by JamesEBall · 2026-02-12
81.3%
#14824: fix: do not trigger provider cooldown on LLM request timeouts
by CyberSinister · 2026-02-12
79.5%
#20946: fix: skip auth cooldown on timeout (not an auth failure)
by austenstone · 2026-02-19
79.3%
#14914: fix: resolve actual failure reason for cooldown-skipped providers
by mcaxtr · 2026-02-12
77.8%
#23210: fix: avoid cooldown on timeout/unknown failovers
by nydamon · 2026-02-22
76.6%
#14368: fix: skip auth profile cooldown on format errors to prevent provide...
by koatora20 · 2026-02-12
76.6%
#23564: feat(auth): add timeout retry before auth profile rotation
by echoVic · 2026-02-22
75.4%
#23341: feat: prioritize lastGood auth profile over round-robin ordering
by kevin930321 · 2026-02-22
74.4%
#19267: fix: derive failover reason from timedOut flag to prevent unknown c...
by austenstone · 2026-02-17
74.2%
#18902: fix: exempt format errors from auth profile cooldown
by tag-assistant · 2026-02-17
74.1%