#7941: fix: scope rate-limit cooldowns per-model instead of per-provider
agents
stale
Cluster:
Rate Limit Management Enhancements
## Summary
When a model hits a 429 rate limit, OpenClaw currently places the **entire auth profile** (= provider) in cooldown. This blocks all other models on the same provider, even though providers enforce separate per-model quotas.
This PR scopes rate-limit cooldowns to the specific model that was rate-limited, while keeping account-level failures (billing, auth) as profile-wide cooldowns.
Fixes #5744
## Problem
As described in #5744, when `gemini-3-flash` exceeds its TPM limit, the entire `google:default` profile enters cooldown. This prevents fallover to `gemini-3-pro` or `gemini-2.5-flash-lite`, which have completely separate and unused quotas.
The same issue affects Anthropic users: a 429 on `claude-opus-4-5` blocks `claude-sonnet-4-5`, despite independent per-model rate limits.
**Root cause:** `markAuthProfileFailure()` and `isProfileInCooldown()` operate at the profile level regardless of failure reason. Rate-limit errors are treated the same as billing/auth errors.
## Solution
**Key insight:** `rate_limit` errors are model-scoped (providers enforce per-model RPM/TPM), while `billing`/`auth` errors are account-scoped.
Changes:
1. **New `ModelCooldownStats` type** — tracks `cooldownUntil`, `errorCount`, `lastFailureAt` per model
2. **`isModelScopedFailure()` helper** — returns `true` for `rate_limit`, extensible for future reasons
3. **`modelCooldowns` field** on `ProfileUsageStats` — `Record<string, ModelCooldownStats>`
4. **`isProfileInCooldown(store, profileId, modelId?)`** — checks profile-level first (always takes precedence), then model-level if `modelId` provided
5. **`markAuthProfileFailure({...modelId?})`** — routes `rate_limit` to per-model tracking when `modelId` is provided; all other reasons use existing profile-level logic
6. **`markAuthProfileUsed({...modelId?})`** — clears model-specific cooldown on successful use
7. **`model-fallback.ts`** — passes `candidate.model` to cooldown check
8. **`pi-embedded-runner/run.ts`** — passes `modelId` to failure/success tracking calls
## Backward Compatibility
Fully backward compatible:
- `modelId` parameter is optional everywhere — when omitted, behavior is identical to current code
- Profile-level cooldown always takes precedence over model-level
- Existing callers that don't pass `modelId` get the old (profile-wide) behavior
- `modelCooldowns` field is optional on `ProfileUsageStats` — existing stored data works unchanged
- Exponential backoff reuses the existing `calculateAuthProfileCooldownMs()` function
## Files Changed
- **`src/agents/auth-profiles/types.ts`** — New `ModelCooldownStats` type, `isModelScopedFailure()` helper, `modelCooldowns` field
- **`src/agents/auth-profiles/usage.ts`** — Per-model routing in `isProfileInCooldown`, `markAuthProfileFailure`, `markAuthProfileUsed`, `clearAuthProfileCooldown`
- **`src/agents/model-fallback.ts`** — Pass `candidate.model` to `isProfileInCooldown()`
- **`src/agents/pi-embedded-runner/run.ts`** — Pass `modelId` to `markAuthProfileFailure()`, `markAuthProfileUsed()`, and `isProfileInCooldown()`
- **`src/agents/auth-profiles.ts`** — Re-export new types
- **`src/agents/auth-profiles.per-model-cooldown.test.ts`** — 9 new tests
## Tests
9 new tests covering:
- ✅ Rate limit on model A does NOT block model B on same profile
- ✅ Billing failure blocks ALL models (profile-wide)
- ✅ Auth failure blocks ALL models (profile-wide)
- ✅ Backward compat: no `modelId` = profile-level cooldown (old behavior)
- ✅ Successful use clears model-specific cooldown
- ✅ Profile-level cooldown takes precedence over model-level
- ✅ Works for Google models (provider-agnostic)
- ✅ Exponential backoff applies to model-level cooldowns
- ✅ Multiple models can have independent cooldowns on same profile
All 58 existing tests pass unchanged.
```
pnpm check ✅ (tsgo + oxlint 0 errors on 3084 files + oxfmt on 3922 files)
pnpm test ✅ (58/58 tests pass — 49 existing + 9 new)
```
## 🤖 AI-Assisted
This PR was written with AI assistance (Claude, via OpenClaw). The code has been:
- [x] Fully tested (58/58 tests pass, including 9 new)
- [x] Linted and formatted with project tooling (`oxlint`, `oxfmt`)
- [x] Type-checked with `tsc --noEmit`
- [x] Reviewed against existing code patterns and conventions
- [x] Human-reviewed and understood before submission
The fix was motivated by hitting this exact issue in production: an Opus 429 blocking Sonnet fallback, forcing an unnecessary jump to a different provider entirely.
Most Similar PRs
#16797: fix(auth-profiles): implement per-model rate limit cooldown tracking
by mulhamna · 2026-02-15
85.2%
#19267: fix: derive failover reason from timedOut flag to prevent unknown c...
by austenstone · 2026-02-17
78.4%
#20388: fix(failover): don't skip same-provider fallback models when cooldo...
by Limitless2023 · 2026-02-18
78.1%
#18902: fix: exempt format errors from auth profile cooldown
by tag-assistant · 2026-02-17
76.5%
#14824: fix: do not trigger provider cooldown on LLM request timeouts
by CyberSinister · 2026-02-12
76.4%
#17231: fix(failover): recognize model_cooldown as rate-limit for fallback
by thebtf · 2026-02-15
75.1%
#13077: fix: prevent cooldown pollution across different models on the same...
by magendary · 2026-02-10
75.0%
#14574: fix: gentler rate-limit cooldown backoff + clear stale cooldowns on...
by JamesEBall · 2026-02-12
74.7%
#23816: fix(agents): model fallback skipped during session overrides and pr...
by ramezgaberiel · 2026-02-22
74.5%
#18670: feat: add first-class Claude Code CLI auth path + CLI model UX hard...
by SmithLabsLLC · 2026-02-16
73.2%