#7941: fix: scope rate-limit cooldowns per-model instead of per-provider

by adrrr open 2026-02-03 10:59 View on GitHub →

agents stale

Cluster: Rate Limit Management Enhancements

## Summary When a model hits a 429 rate limit, OpenClaw currently places the **entire auth profile** (= provider) in cooldown. This blocks all other models on the same provider, even though providers enforce separate per-model quotas. This PR scopes rate-limit cooldowns to the specific model that was rate-limited, while keeping account-level failures (billing, auth) as profile-wide cooldowns. Fixes #5744 ## Problem As described in #5744, when `gemini-3-flash` exceeds its TPM limit, the entire `google:default` profile enters cooldown. This prevents fallover to `gemini-3-pro` or `gemini-2.5-flash-lite`, which have completely separate and unused quotas. The same issue affects Anthropic users: a 429 on `claude-opus-4-5` blocks `claude-sonnet-4-5`, despite independent per-model rate limits. **Root cause:** `markAuthProfileFailure()` and `isProfileInCooldown()` operate at the profile level regardless of failure reason. Rate-limit errors are treated the same as billing/auth errors. ## Solution **Key insight:** `rate_limit` errors are model-scoped (providers enforce per-model RPM/TPM), while `billing`/`auth` errors are account-scoped. Changes: 1. **New `ModelCooldownStats` type** — tracks `cooldownUntil`, `errorCount`, `lastFailureAt` per model 2. **`isModelScopedFailure()` helper** — returns `true` for `rate_limit`, extensible for future reasons 3. **`modelCooldowns` field** on `ProfileUsageStats` — `Record<string, ModelCooldownStats>` 4. **`isProfileInCooldown(store, profileId, modelId?)`** — checks profile-level first (always takes precedence), then model-level if `modelId` provided 5. **`markAuthProfileFailure({...modelId?})`** — routes `rate_limit` to per-model tracking when `modelId` is provided; all other reasons use existing profile-level logic 6. **`markAuthProfileUsed({...modelId?})`** — clears model-specific cooldown on successful use 7. **`model-fallback.ts`** — passes `candidate.model` to cooldown check 8. **`pi-embedded-runner/run.ts`** — passes `modelId` to failure/success tracking calls ## Backward Compatibility Fully backward compatible: - `modelId` parameter is optional everywhere — when omitted, behavior is identical to current code - Profile-level cooldown always takes precedence over model-level - Existing callers that don't pass `modelId` get the old (profile-wide) behavior - `modelCooldowns` field is optional on `ProfileUsageStats` — existing stored data works unchanged - Exponential backoff reuses the existing `calculateAuthProfileCooldownMs()` function ## Files Changed - **`src/agents/auth-profiles/types.ts`** — New `ModelCooldownStats` type, `isModelScopedFailure()` helper, `modelCooldowns` field - **`src/agents/auth-profiles/usage.ts`** — Per-model routing in `isProfileInCooldown`, `markAuthProfileFailure`, `markAuthProfileUsed`, `clearAuthProfileCooldown` - **`src/agents/model-fallback.ts`** — Pass `candidate.model` to `isProfileInCooldown()` - **`src/agents/pi-embedded-runner/run.ts`** — Pass `modelId` to `markAuthProfileFailure()`, `markAuthProfileUsed()`, and `isProfileInCooldown()` - **`src/agents/auth-profiles.ts`** — Re-export new types - **`src/agents/auth-profiles.per-model-cooldown.test.ts`** — 9 new tests ## Tests 9 new tests covering: - ✅ Rate limit on model A does NOT block model B on same profile - ✅ Billing failure blocks ALL models (profile-wide) - ✅ Auth failure blocks ALL models (profile-wide) - ✅ Backward compat: no `modelId` = profile-level cooldown (old behavior) - ✅ Successful use clears model-specific cooldown - ✅ Profile-level cooldown takes precedence over model-level - ✅ Works for Google models (provider-agnostic) - ✅ Exponential backoff applies to model-level cooldowns - ✅ Multiple models can have independent cooldowns on same profile All 58 existing tests pass unchanged. ``` pnpm check ✅ (tsgo + oxlint 0 errors on 3084 files + oxfmt on 3922 files) pnpm test ✅ (58/58 tests pass — 49 existing + 9 new) ``` ## 🤖 AI-Assisted This PR was written with AI assistance (Claude, via OpenClaw). The code has been: - [x] Fully tested (58/58 tests pass, including 9 new) - [x] Linted and formatted with project tooling (`oxlint`, `oxfmt`) - [x] Type-checked with `tsc --noEmit` - [x] Reviewed against existing code patterns and conventions - [x] Human-reviewed and understood before submission The fix was motivated by hitting this exact issue in production: an Opus 429 blocking Sonnet fallback, forcing an unnecessary jump to a different provider entirely.