#14574: fix: gentler rate-limit cooldown backoff + clear stale cooldowns on restart
gateway
agents
stale
size: S
Cluster:
Rate Limit Management Enhancements
## Summary
- **Rate-limit cooldowns use per-reason counts with gentler backoff** (30s → 60s → 120s → max 5min) instead of sharing the total `errorCount` with other failure types (which escalated to 1hr after just a few cascading failures)
- **Stale rate-limit cooldowns are cleared on gateway restart** so providers that have recovered aren't blocked by cooldown timestamps persisted to `auth-profiles.json`
- **Billing-disabled profiles are preserved** across restarts (only rate-limit/timeout/unknown cooldowns are cleared)
## Problem
When running with OAuth tokens (e.g., Claude Code) and multiple fallback models, a single rate-limit event cascades through all fallback models. Each failure increments the shared `errorCount`, so the exponential backoff (`60s * 5^(errorCount-1)`) quickly maxes out at 1 hour for every provider simultaneously. The cooldowns are persisted to disk, so even `gateway restart` doesn't help — the bot stays locked out for hours even after the rate limit window resets.
## Root cause
`computeNextProfileUsageStats` used `nextErrorCount` (total errors across all failure types) for rate-limit backoff calculation. With 6 fallback models cascading, `errorCount` reaches 3+ per provider in seconds, pushing all cooldowns to max (1 hour).
## Changes
- `src/agents/auth-profiles/usage.ts`:
- Added `calculateRateLimitCooldownMs()` — gentler backoff capped at 5 minutes
- `computeNextProfileUsageStats()` now uses per-reason count (`failureCounts.rate_limit`) for rate-limit backoff instead of total `errorCount`
- Added `clearAllRateLimitCooldowns()` to reset non-billing cooldowns
- `src/gateway/server-startup.ts`:
- Call `clearAllRateLimitCooldowns()` at gateway startup
- `src/agents/auth-profiles.ts`:
- Export new functions
## Test plan
- [x] New test: rate_limit failures cap at 5 minutes even after 5 consecutive failures
- [x] New test: `clearAllRateLimitCooldowns` clears rate-limit cooldowns but preserves billing-disabled profiles
- [ ] Existing tests still pass
- [ ] Manual test with OAuth token + concurrent subagents hitting rate limits
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR adjusts auth-profile cooldown behavior to prevent rate-limit cascades across fallback models. It introduces a rate_limit-specific cooldown backoff (30s → 60s → 120s, capped at 5 minutes), switches rate-limit cooldown calculation to use per-reason counts rather than the shared `errorCount`, and clears persisted non-billing cooldowns on gateway startup so stale cooldown timestamps don’t block recovered providers after restart.
Key touched areas are `src/agents/auth-profiles/usage.ts` (cooldown computation + new clearing routine), `src/gateway/server-startup.ts` (startup hook to clear cooldowns), and tests validating the new backoff and clearing behavior.
<h3>Confidence Score: 4/5</h3>
- This PR is close to safe to merge, but it has one behavior-changing bug around clearing persisted failure history on restart.
- Core rate-limit backoff changes look consistent and are covered by new tests. The main concern is `clearAllRateLimitCooldowns` resetting `failureCounts` wholesale for non-billing-disabled profiles, which unintentionally resets stored billing failure history and can reduce future billing backoff after a restart.
- src/agents/auth-profiles/usage.ts
<!-- greptile_other_comments_section -->
<sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#14824: fix: do not trigger provider cooldown on LLM request timeouts
by CyberSinister · 2026-02-12
85.8%
#14914: fix: resolve actual failure reason for cooldown-skipped providers
by mcaxtr · 2026-02-12
85.7%
#19267: fix: derive failover reason from timedOut flag to prevent unknown c...
by austenstone · 2026-02-17
84.4%
#4462: fix: prevent gateway crash when all auth profiles are in cooldown
by garnetlyx · 2026-01-30
82.2%
#18902: fix: exempt format errors from auth profile cooldown
by tag-assistant · 2026-02-17
82.2%
#11371: Auth: cap rate-limit cooldown at 5 minutes; add maxCooldownMinutes ...
by lailoo · 2026-02-07
81.3%
#13658: fix: silent model failover with fallback notification
by taw0002 · 2026-02-10
81.1%
#14368: fix: skip auth profile cooldown on format errors to prevent provide...
by koatora20 · 2026-02-12
81.0%
#16797: fix(auth-profiles): implement per-model rate limit cooldown tracking
by mulhamna · 2026-02-15
80.5%
#23816: fix(agents): model fallback skipped during session overrides and pr...
by ramezgaberiel · 2026-02-22
79.5%