#20388: fix(failover): don't skip same-provider fallback models when cooldown is rate-limit only
agents
size: S
Cluster:
Model Cooldown Management
## Problem
When a model fails due to a rate limit or timeout, the auth profile for its
provider enters a short cooldown. Previously, this cooldown caused **ALL models
on that provider to be skipped** in `runWithModelFallback` — including fallback
models that may be completely unaffected.
### Affected scenarios
**Claude Max Plan (per-model token pools):**
```
primary: anthropic/claude-opus-4-6
fallbacks: [anthropic/claude-sonnet-4-6, ollama/qwen2.5-coder:14b]
```
Opus hits its monthly token pool limit → `anthropic:default` enters cooldown →
Sonnet is skipped (same provider!) → falls through to Ollama which can't handle
the full system prompt → silent failure / "How can I assist you today?" loop
**Multi-provider failover:**
Codex rate-limited → provider marked in cooldown → all Codex models AND
configured Anthropic backups skipped simultaneously → no capable model available
## Root Cause
`runWithModelFallback` used a single `isProfileInCooldown` check to gate **all**
candidates for a provider. But there are two fundamentally different reasons an
auth profile may be unavailable:
1. **Hard-disabled** (`disabledUntil` set) — billing or auth failure that means
the credentials themselves are broken → skip ALL models (correct behavior)
2. **Soft cooldown** (`cooldownUntil` set) — rate limit or timeout, often
model-specific → other models on the same provider may work fine
## Fix
Introduce `isProfileHardDisabled()` that returns `true` only when `disabledUntil`
is set. Update `runWithModelFallback` to:
- Hard-disabled → skip all models for this provider (unchanged behavior)
- Soft cooldown on **primary** model (i === 0) → existing probe logic unchanged
- Soft cooldown on **non-primary fallback** (i > 0) → **allow the attempt**;
the inner runner handles profile rotation from there
## Test scenario that now works
```
primary: anthropic/claude-opus-4-6 # exhausts Max Plan pool → rate_limit cooldown
fallbacks: [anthropic/claude-sonnet-4-6] # same provider, different model pool → ✅ now tried
```
Fixes #20316
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR introduces a distinction between "hard-disabled" (billing/auth failure) and "soft-cooldown" (rate limit/timeout) auth profiles to prevent `runWithModelFallback` from skipping same-provider fallback models unnecessarily. It also fixes a separate bug in `sessions-list-tool.ts` where transcript path resolution failed when `storePath` was absent or contained a non-path placeholder like `"(multiple)"`.
**Key changes:**
- New `isProfileHardDisabled()` function that checks only `disabledUntil` (billing failures), distinct from `isProfileInCooldown()` which checks both `cooldownUntil` and `disabledUntil`
- `runWithModelFallback` now skips all models on a provider only for hard-disabled profiles; non-primary fallback models bypass soft cooldown checks
- `sessions-list-tool.ts` no longer requires `storePath` to resolve transcript paths and validates it is a real absolute path before using it
**Issues found:**
- The existing probe test (`model-fallback.probe.test.ts`) does not mock `isProfileHardDisabled`, which will cause `TypeError` failures since the mock factory replaces the entire module
- The `.some()` check for hard-disabled profiles may over-aggressively skip models in multi-profile setups where only one profile has a billing failure
<h3>Confidence Score: 2/5</h3>
- This PR will break existing tests and has a potential logic issue in the core failover path
- The missing `isProfileHardDisabled` mock in `model-fallback.probe.test.ts` will cause test failures (TypeError at runtime). The `.some()` vs `.every()` semantics in the hard-disabled check could cause incorrect provider skipping in multi-profile setups. The core approach is sound but needs these issues resolved before merging.
- Pay close attention to `src/agents/model-fallback.ts` (core failover logic changes) and `src/agents/model-fallback.probe.test.ts` (broken mock)
<sub>Last reviewed commit: 6522b6c</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#13077: fix: prevent cooldown pollution across different models on the same...
by magendary · 2026-02-10
85.6%
#23816: fix(agents): model fallback skipped during session overrides and pr...
by ramezgaberiel · 2026-02-22
82.5%
#14914: fix: resolve actual failure reason for cooldown-skipped providers
by mcaxtr · 2026-02-12
81.4%
#19267: fix: derive failover reason from timedOut flag to prevent unknown c...
by austenstone · 2026-02-17
81.1%
#14824: fix: do not trigger provider cooldown on LLM request timeouts
by CyberSinister · 2026-02-12
80.3%
#18902: fix: exempt format errors from auth profile cooldown
by tag-assistant · 2026-02-17
80.2%
#15881: fix(models): probe-safe cooldown handling and compatible fallback a...
by wboudy · 2026-02-14
79.8%
#22064: fix(failover): bypass models allowlist for configured fallback models
by winston-bepresent · 2026-02-20
79.3%
#14574: fix: gentler rate-limit cooldown backoff + clear stale cooldowns on...
by JamesEBall · 2026-02-12
78.8%
#16797: fix(auth-profiles): implement per-model rate limit cooldown tracking
by mulhamna · 2026-02-15
78.7%