#16797: fix(auth-profiles): implement per-model rate limit cooldown tracking

by mulhamna open 2026-02-15 04:14 View on GitHub →

agents stale size: XL

Cluster: Rate Limit Management Enhancements

**FIX ISSUE #5744** Previously, when one Google model (e.g., gemini-3-flash) hit HTTP 429 rate limit,OpenClaw would put the ENTIRE provider "google" in cooldown, blocking all other Google models even if they still had available quota. This change implements per-model cooldown tracking: - Rate limit (429) errors now apply cooldown to the specific model only - Other models from the same provider remain usable - Auth/billing errors still trigger full provider cooldown (as expected) - Parse Google's quotaDimensions.model from 429 response for accurate tracking --- PR Summary ## Summary - Implement per-model rate limit cooldown tracking instead of full provider cooldown - Add `extractRateLimitedModel()` helper to parse model from Google 429 responses - Update `isProfileInCooldown()`, `markAuthProfileFailure()`, and related functions to support model-level cooldowns - Modify model routing to check cooldowns at the model level ## Problem When one Google model (e.g., `gemini-3-flash`) hits its per-model rate limit (HTTP 429), OpenClaw was putting the **entire provider** in cooldown. This blocked all other Google models(`gemini-2.5-flash-lite`, `gemini-3-pro-preview`, etc.) even though they still had available quota. ## Solution - **Per-model cooldown tracking**: Added `modelCooldowns`, `modelErrorCounts`, and `modelLastFailureAt` fields to `ProfileUsageStats` - **Smart 429 parsing**: Extract model info from Google's `quotaDimensions.model` in 429 responses - **Scoped cooldowns**: Rate limit errors now only affect the specific model that hit its limit - **Preserved provider-wide cooldowns**: Auth and billing errors still disable the entire profile (correct behavior) ## Changes | File | Change | |------|--------| | `auth-profiles/types.ts` | Add model-level cooldown fields to `ProfileUsageStats` | | `auth-profiles/usage.ts` | Update cooldown functions to support model parameter | | `auth-profiles/order.ts` | Add model parameter to profile ordering logic | | `pi-embedded-helpers/errors.ts` | Add `extractRateLimitedModel()` helper | | `model-fallback.ts` | Pass model to cooldown checks in fallback routing | | `pi-embedded-runner/run.ts` | Pass model info when marking failures | ## Test Plan - [x] Model A rate limit → only Model A in cooldown - [x] Model B from same provider still accessible - [x] Auth error → full profile cooldown (all models blocked) - [x] Billing error → full profile disabled (all models blocked) - [x] Per-model exponential backoff works correctly - [x] Backwards compatible: rate_limit without model info falls back to profile-level cooldown ## New Tests - `auth-profiles.per-model-cooldown.test.ts` (10 test cases) - `pi-embedded-helpers.extractratelimitedmodel.test.ts` (10 test cases)  <h3>Greptile Summary</h3> Implements per-model rate limit cooldown tracking so that when one Google model (e.g., `gemini-3-flash`) hits HTTP 429, only that model is put in cooldown — other models on the same provider profile remain usable. Auth/billing errors continue to apply full provider-wide cooldowns. - Adds `modelCooldowns`, `modelErrorCounts`, and `modelLastFailureAt` fields to `ProfileUsageStats` for model-level tracking - New `extractRateLimitedModel()` parses Google's `quotaDimensions.model` from 429 responses to identify which model hit its limit - `model-fallback.ts` correctly passes model to both profile ordering and cooldown checks - `pi-embedded-runner/run.ts` passes model info when marking failures and successes - Good test coverage: 10 tests for per-model cooldown behavior, 10 tests for `extractRateLimitedModel` **Issues found:** - The `resolveAuthProfileOrder` call in `pi-embedded-runner/run.ts:258` doesn't pass the `model` parameter, so the initial profile selection in the embedded runner doesn't benefit from per-model cooldowns — a rate-limited profile for the current model may still be tried first - `clearAuthProfileCooldown` with a model parameter doesn't clear `modelLastFailureAt[model]`, unlike `markAuthProfileUsed` which clears all three model-level fields — creating an inconsistency in state cleanup <h3>Confidence Score: 3/5</h3> - Generally safe but has an incomplete integration that partially undermines the feature's effectiveness - The core per-model cooldown logic is well-implemented and well-tested. However, the missing model parameter in the initial resolveAuthProfileOrder call in pi-embedded-runner/run.ts means per-model cooldowns won't be considered when selecting the first profile candidate — the primary code path. The model-fallback.ts integration is correct, but the embedded runner's initial ordering is incomplete. The clearAuthProfileCooldown inconsistency is a secondary concern. - Pay close attention to `src/agents/pi-embedded-runner/run.ts` (missing model param in resolveAuthProfileOrder) and `src/agents/auth-profiles/usage.ts` (inconsistent state cleanup in clearAuthProfileCooldown) <sub>Last reviewed commit: 2034fd5</sub>  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>