#14824: fix: do not trigger provider cooldown on LLM request timeouts

by CyberSinister open 2026-02-12 17:31 View on GitHub →

agents stale size: XS

Cluster: Rate Limit Management Enhancements

## Problem LLM request timeouts are misclassified as rate limits, triggering exponential provider-wide cooldown (1min → 5min → 25min → 1hour). A single slow request from one session blocks **all models** on that provider for **all sessions**. ### Reproduction 1. Run 3-4 concurrent sessions on Anthropic (Opus) 2. One request takes slightly too long and times out 3. The timeout triggers `markAuthProfileFailure` with `reason: "timeout"` 4. `usage.ts` applies exponential cooldown to the profile (same as rate_limit) 5. If both auth profiles timeout, the entire provider enters cooldown 6. **All sessions** get: `"Provider anthropic is in cooldown (all profiles unavailable) (rate_limit)"` 7. No actual HTTP 429 was ever received ### Root Cause Two issues in the codebase: 1. **`src/agents/auth-profiles/usage.ts`** — `computeUpdatedStatsOnFailure()` applies `cooldownUntil` to **all** non-billing failures, including timeouts. There is no distinction between a real rate limit (HTTP 429) and a request that simply took too long. 2. **`src/agents/pi-embedded-runner/run.ts`** — Line ~755 has a comment: *"Treat timeout as potential rate limit (Antigravity hangs on rate limit)"*. This was a workaround for the Antigravity proxy (which silently hangs instead of returning 429), but it applies to **all providers** — including direct Anthropic and Google APIs where timeouts are transient and unrelated to rate limiting. ## Fix ### `usage.ts` Skip `cooldownUntil` for `reason === "timeout"`, matching the existing pattern for billing exclusion. The error count still increments for tracking purposes, but the profile is not put into cooldown. ### `run.ts` Only rotate auth profiles on timeout for proxy providers (e.g., `google-antigravity`) where a hang genuinely indicates a silent rate limit. Direct providers (Anthropic, Google) timeout transiently and should not trigger profile rotation or cooldown. ## Impact - **Before:** A single timeout cascades into 5-60 minute provider lockout across all sessions - **After:** Timeouts are recorded but do not trigger cooldown. Provider remains available for other sessions. ## Related Issues Fixes #10669 — Subagent timeouts should not trigger provider-level cooldown Relates to: - #11352 — Auth profile cooldown backoff too aggressive for transient rate limits - #10375 — Profile-level cooldown blocks all models when only one model is rate-limited - #13336 — Local Ollama provider incorrectly enters cooldown after timeout - #13807 — Fallback models not attempted when primary provider enters rate_limit cooldown ## Testing - TypeScript compilation passes (`tsc --noEmit`) - Linting passes (`oxlint`) - Changes are minimal and surgical (2 files, 14 lines added, 2 changed) - Existing behavior preserved for actual rate limits and billing failures  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adjusts auth-profile cooldown/rotation logic so that LLM request timeouts no longer cascade into provider-wide cooldown. - `src/agents/auth-profiles/usage.ts`: `computeNextProfileUsageStats` now treats `reason === "timeout"` as a tracked failure (increments `errorCount`/`failureCounts`) but **does not** set `cooldownUntil`, matching the existing special-casing for billing. - `src/agents/pi-embedded-runner/run.ts`: profile rotation on timeout is now limited to the `google-antigravity` proxy provider (where a hang/timeout can indicate silent rate limiting). For direct providers, timeouts will no longer rotate auth profiles. These changes fit into the existing failover system where `classifyFailoverReason` maps timeouts/transient 5xx errors to `"timeout"`, and where profile unavailability is determined by `cooldownUntil`/`disabledUntil` rather than raw `errorCount`. <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk. - Changes are narrowly scoped and consistent with existing provider IDs and cooldown semantics: timeouts no longer set `cooldownUntil`, and timeout-based rotation remains for the one proxy provider (`google-antigravity`) that is explicitly modeled in the repo. Profile cooldown checks rely on `cooldownUntil`/`disabledUntil`, so this won’t accidentally keep profiles unusable via `errorCount` alone. - No files require special attention  <sub>(5/5) You can turn off certain types of comments like style [here](https://app.greptile.com/review/github)!</sub>