#14824: fix: do not trigger provider cooldown on LLM request timeouts
agents
stale
size: XS
Cluster:
Rate Limit Management Enhancements
## Problem
LLM request timeouts are misclassified as rate limits, triggering exponential provider-wide cooldown (1min → 5min → 25min → 1hour). A single slow request from one session blocks **all models** on that provider for **all sessions**.
### Reproduction
1. Run 3-4 concurrent sessions on Anthropic (Opus)
2. One request takes slightly too long and times out
3. The timeout triggers `markAuthProfileFailure` with `reason: "timeout"`
4. `usage.ts` applies exponential cooldown to the profile (same as rate_limit)
5. If both auth profiles timeout, the entire provider enters cooldown
6. **All sessions** get: `"Provider anthropic is in cooldown (all profiles unavailable) (rate_limit)"`
7. No actual HTTP 429 was ever received
### Root Cause
Two issues in the codebase:
1. **`src/agents/auth-profiles/usage.ts`** — `computeUpdatedStatsOnFailure()` applies `cooldownUntil` to **all** non-billing failures, including timeouts. There is no distinction between a real rate limit (HTTP 429) and a request that simply took too long.
2. **`src/agents/pi-embedded-runner/run.ts`** — Line ~755 has a comment: *"Treat timeout as potential rate limit (Antigravity hangs on rate limit)"*. This was a workaround for the Antigravity proxy (which silently hangs instead of returning 429), but it applies to **all providers** — including direct Anthropic and Google APIs where timeouts are transient and unrelated to rate limiting.
## Fix
### `usage.ts`
Skip `cooldownUntil` for `reason === "timeout"`, matching the existing pattern for billing exclusion. The error count still increments for tracking purposes, but the profile is not put into cooldown.
### `run.ts`
Only rotate auth profiles on timeout for proxy providers (e.g., `google-antigravity`) where a hang genuinely indicates a silent rate limit. Direct providers (Anthropic, Google) timeout transiently and should not trigger profile rotation or cooldown.
## Impact
- **Before:** A single timeout cascades into 5-60 minute provider lockout across all sessions
- **After:** Timeouts are recorded but do not trigger cooldown. Provider remains available for other sessions.
## Related Issues
Fixes #10669 — Subagent timeouts should not trigger provider-level cooldown
Relates to:
- #11352 — Auth profile cooldown backoff too aggressive for transient rate limits
- #10375 — Profile-level cooldown blocks all models when only one model is rate-limited
- #13336 — Local Ollama provider incorrectly enters cooldown after timeout
- #13807 — Fallback models not attempted when primary provider enters rate_limit cooldown
## Testing
- TypeScript compilation passes (`tsc --noEmit`)
- Linting passes (`oxlint`)
- Changes are minimal and surgical (2 files, 14 lines added, 2 changed)
- Existing behavior preserved for actual rate limits and billing failures
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR adjusts auth-profile cooldown/rotation logic so that LLM request timeouts no longer cascade into provider-wide cooldown.
- `src/agents/auth-profiles/usage.ts`: `computeNextProfileUsageStats` now treats `reason === "timeout"` as a tracked failure (increments `errorCount`/`failureCounts`) but **does not** set `cooldownUntil`, matching the existing special-casing for billing.
- `src/agents/pi-embedded-runner/run.ts`: profile rotation on timeout is now limited to the `google-antigravity` proxy provider (where a hang/timeout can indicate silent rate limiting). For direct providers, timeouts will no longer rotate auth profiles.
These changes fit into the existing failover system where `classifyFailoverReason` maps timeouts/transient 5xx errors to `"timeout"`, and where profile unavailability is determined by `cooldownUntil`/`disabledUntil` rather than raw `errorCount`.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk.
- Changes are narrowly scoped and consistent with existing provider IDs and cooldown semantics: timeouts no longer set `cooldownUntil`, and timeout-based rotation remains for the one proxy provider (`google-antigravity`) that is explicitly modeled in the repo. Profile cooldown checks rely on `cooldownUntil`/`disabledUntil`, so this won’t accidentally keep profiles unusable via `errorCount` alone.
- No files require special attention
<!-- greptile_other_comments_section -->
<sub>(5/5) You can turn off certain types of comments like style [here](https://app.greptile.com/review/github)!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#14574: fix: gentler rate-limit cooldown backoff + clear stale cooldowns on...
by JamesEBall · 2026-02-12
85.8%
#20946: fix: skip auth cooldown on timeout (not an auth failure)
by austenstone · 2026-02-19
85.0%
#19267: fix: derive failover reason from timedOut flag to prevent unknown c...
by austenstone · 2026-02-17
83.8%
#14914: fix: resolve actual failure reason for cooldown-skipped providers
by mcaxtr · 2026-02-12
83.2%
#18902: fix: exempt format errors from auth profile cooldown
by tag-assistant · 2026-02-17
83.0%
#23210: fix: avoid cooldown on timeout/unknown failovers
by nydamon · 2026-02-22
82.7%
#16797: fix(auth-profiles): implement per-model rate limit cooldown tracking
by mulhamna · 2026-02-15
81.8%
#14368: fix: skip auth profile cooldown on format errors to prevent provide...
by koatora20 · 2026-02-12
80.5%
#23564: feat(auth): add timeout retry before auth profile rotation
by echoVic · 2026-02-22
80.4%
#20388: fix(failover): don't skip same-provider fallback models when cooldo...
by Limitless2023 · 2026-02-18
80.3%