#14574: fix: gentler rate-limit cooldown backoff + clear stale cooldowns on restart

by JamesEBall open 2026-02-12 09:43 View on GitHub →

gateway agents stale size: S

Cluster: Rate Limit Management Enhancements

## Summary - **Rate-limit cooldowns use per-reason counts with gentler backoff** (30s → 60s → 120s → max 5min) instead of sharing the total `errorCount` with other failure types (which escalated to 1hr after just a few cascading failures) - **Stale rate-limit cooldowns are cleared on gateway restart** so providers that have recovered aren't blocked by cooldown timestamps persisted to `auth-profiles.json` - **Billing-disabled profiles are preserved** across restarts (only rate-limit/timeout/unknown cooldowns are cleared) ## Problem When running with OAuth tokens (e.g., Claude Code) and multiple fallback models, a single rate-limit event cascades through all fallback models. Each failure increments the shared `errorCount`, so the exponential backoff (`60s * 5^(errorCount-1)`) quickly maxes out at 1 hour for every provider simultaneously. The cooldowns are persisted to disk, so even `gateway restart` doesn't help — the bot stays locked out for hours even after the rate limit window resets. ## Root cause `computeNextProfileUsageStats` used `nextErrorCount` (total errors across all failure types) for rate-limit backoff calculation. With 6 fallback models cascading, `errorCount` reaches 3+ per provider in seconds, pushing all cooldowns to max (1 hour). ## Changes - `src/agents/auth-profiles/usage.ts`: - Added `calculateRateLimitCooldownMs()` — gentler backoff capped at 5 minutes - `computeNextProfileUsageStats()` now uses per-reason count (`failureCounts.rate_limit`) for rate-limit backoff instead of total `errorCount` - Added `clearAllRateLimitCooldowns()` to reset non-billing cooldowns - `src/gateway/server-startup.ts`: - Call `clearAllRateLimitCooldowns()` at gateway startup - `src/agents/auth-profiles.ts`: - Export new functions ## Test plan - [x] New test: rate_limit failures cap at 5 minutes even after 5 consecutive failures - [x] New test: `clearAllRateLimitCooldowns` clears rate-limit cooldowns but preserves billing-disabled profiles - [ ] Existing tests still pass - [ ] Manual test with OAuth token + concurrent subagents hitting rate limits 🤖 Generated with [Claude Code](https://claude.com/claude-code)  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adjusts auth-profile cooldown behavior to prevent rate-limit cascades across fallback models. It introduces a rate_limit-specific cooldown backoff (30s → 60s → 120s, capped at 5 minutes), switches rate-limit cooldown calculation to use per-reason counts rather than the shared `errorCount`, and clears persisted non-billing cooldowns on gateway startup so stale cooldown timestamps don’t block recovered providers after restart. Key touched areas are `src/agents/auth-profiles/usage.ts` (cooldown computation + new clearing routine), `src/gateway/server-startup.ts` (startup hook to clear cooldowns), and tests validating the new backoff and clearing behavior. <h3>Confidence Score: 4/5</h3> - This PR is close to safe to merge, but it has one behavior-changing bug around clearing persisted failure history on restart. - Core rate-limit backoff changes look consistent and are covered by new tests. The main concern is `clearAllRateLimitCooldowns` resetting `failureCounts` wholesale for non-billing-disabled profiles, which unintentionally resets stored billing failure history and can reduce future billing backoff after a restart. - src/agents/auth-profiles/usage.ts  <sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>