#16797: fix(auth-profiles): implement per-model rate limit cooldown tracking
agents
stale
size: XL
Cluster:
Rate Limit Management Enhancements
**FIX ISSUE #5744**
Previously, when one Google model (e.g., gemini-3-flash) hit HTTP 429 rate limit,OpenClaw would put the ENTIRE provider "google" in cooldown, blocking all other Google models even if they still had available quota.
This change implements per-model cooldown tracking:
- Rate limit (429) errors now apply cooldown to the specific model only
- Other models from the same provider remain usable
- Auth/billing errors still trigger full provider cooldown (as expected)
- Parse Google's quotaDimensions.model from 429 response for accurate tracking
---
PR Summary
## Summary
- Implement per-model rate limit cooldown tracking instead of full provider cooldown
- Add `extractRateLimitedModel()` helper to parse model from Google 429 responses
- Update `isProfileInCooldown()`, `markAuthProfileFailure()`, and related functions to support model-level cooldowns
- Modify model routing to check cooldowns at the model level
## Problem
When one Google model (e.g., `gemini-3-flash`) hits its per-model rate limit (HTTP 429), OpenClaw was putting the **entire provider** in cooldown. This blocked all other Google models(`gemini-2.5-flash-lite`, `gemini-3-pro-preview`, etc.) even though they still had available quota.
## Solution
- **Per-model cooldown tracking**: Added `modelCooldowns`, `modelErrorCounts`, and `modelLastFailureAt` fields to `ProfileUsageStats`
- **Smart 429 parsing**: Extract model info from Google's `quotaDimensions.model` in 429 responses
- **Scoped cooldowns**: Rate limit errors now only affect the specific model that hit its limit
- **Preserved provider-wide cooldowns**: Auth and billing errors still disable the entire profile (correct behavior)
## Changes
| File | Change |
|------|--------|
| `auth-profiles/types.ts` | Add model-level cooldown fields to `ProfileUsageStats` |
| `auth-profiles/usage.ts` | Update cooldown functions to support model parameter |
| `auth-profiles/order.ts` | Add model parameter to profile ordering logic |
| `pi-embedded-helpers/errors.ts` | Add `extractRateLimitedModel()` helper |
| `model-fallback.ts` | Pass model to cooldown checks in fallback routing |
| `pi-embedded-runner/run.ts` | Pass model info when marking failures |
## Test Plan
- [x] Model A rate limit → only Model A in cooldown
- [x] Model B from same provider still accessible
- [x] Auth error → full profile cooldown (all models blocked)
- [x] Billing error → full profile disabled (all models blocked)
- [x] Per-model exponential backoff works correctly
- [x] Backwards compatible: rate_limit without model info falls back to profile-level cooldown
## New Tests
- `auth-profiles.per-model-cooldown.test.ts` (10 test cases)
- `pi-embedded-helpers.extractratelimitedmodel.test.ts` (10 test cases)
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Implements per-model rate limit cooldown tracking so that when one Google model (e.g., `gemini-3-flash`) hits HTTP 429, only that model is put in cooldown — other models on the same provider profile remain usable. Auth/billing errors continue to apply full provider-wide cooldowns.
- Adds `modelCooldowns`, `modelErrorCounts`, and `modelLastFailureAt` fields to `ProfileUsageStats` for model-level tracking
- New `extractRateLimitedModel()` parses Google's `quotaDimensions.model` from 429 responses to identify which model hit its limit
- `model-fallback.ts` correctly passes model to both profile ordering and cooldown checks
- `pi-embedded-runner/run.ts` passes model info when marking failures and successes
- Good test coverage: 10 tests for per-model cooldown behavior, 10 tests for `extractRateLimitedModel`
**Issues found:**
- The `resolveAuthProfileOrder` call in `pi-embedded-runner/run.ts:258` doesn't pass the `model` parameter, so the initial profile selection in the embedded runner doesn't benefit from per-model cooldowns — a rate-limited profile for the current model may still be tried first
- `clearAuthProfileCooldown` with a model parameter doesn't clear `modelLastFailureAt[model]`, unlike `markAuthProfileUsed` which clears all three model-level fields — creating an inconsistency in state cleanup
<h3>Confidence Score: 3/5</h3>
- Generally safe but has an incomplete integration that partially undermines the feature's effectiveness
- The core per-model cooldown logic is well-implemented and well-tested. However, the missing model parameter in the initial resolveAuthProfileOrder call in pi-embedded-runner/run.ts means per-model cooldowns won't be considered when selecting the first profile candidate — the primary code path. The model-fallback.ts integration is correct, but the embedded runner's initial ordering is incomplete. The clearAuthProfileCooldown inconsistency is a secondary concern.
- Pay close attention to `src/agents/pi-embedded-runner/run.ts` (missing model param in resolveAuthProfileOrder) and `src/agents/auth-profiles/usage.ts` (inconsistent state cleanup in clearAuthProfileCooldown)
<sub>Last reviewed commit: 2034fd5</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#7941: fix: scope rate-limit cooldowns per-model instead of per-provider
by adrrr · 2026-02-03
85.2%
#19267: fix: derive failover reason from timedOut flag to prevent unknown c...
by austenstone · 2026-02-17
82.1%
#14824: fix: do not trigger provider cooldown on LLM request timeouts
by CyberSinister · 2026-02-12
81.8%
#14574: fix: gentler rate-limit cooldown backoff + clear stale cooldowns on...
by JamesEBall · 2026-02-12
80.5%
#13077: fix: prevent cooldown pollution across different models on the same...
by magendary · 2026-02-10
79.5%
#18902: fix: exempt format errors from auth profile cooldown
by tag-assistant · 2026-02-17
79.2%
#20388: fix(failover): don't skip same-provider fallback models when cooldo...
by Limitless2023 · 2026-02-18
78.7%
#15881: fix(models): probe-safe cooldown handling and compatible fallback a...
by wboudy · 2026-02-14
77.3%
#7570: fix: allow models from providers with auth profiles configured
by DonSqualo · 2026-02-03
76.9%
#11693: Model Provider Failover for Default and Session Model When Rate Lim...
by synchronic1 · 2026-02-08
76.3%