#20911: fix: auto-reset session when all models time out

by tag-assistant open 2026-02-19 12:32 View on GitHub →

size: XS

Cluster: Rate Limit Management Enhancements

Closes #20910 ## What Adds a circuit breaker that auto-resets the session when all fallback models time out, breaking the death spiral on first occurrence. ## Why When a session grows very large, image sanitization and serialization consume the entire 120s timeout budget. All fallback models fail. The error does not match isLikelyContextOverflowError, so no self-healing triggers and the gateway retries indefinitely. ## How - Detect all-timed-out pattern via regex on FailoverError message - Exclude rate-limit timeouts via negative lookahead - Call existing resetSessionAfterCompactionFailure to clear the session - Return user-facing message explaining the reset - Uses existing didResetAfterCompactionFailure guard to prevent double-reset  <h3>Greptile Summary</h3> adds circuit breaker to auto-reset sessions when all fallback models time out, preventing infinite retry loops. also introduces per-model cooldown tracking to prevent one slow model from blocking other models on the same auth profile. **Key changes:** - new timeout detection regex in `agent-runner-execution.ts` triggers session reset when all models time out - per-model cooldown infrastructure added (`isProfileInCooldownForModel`, `modelCooldowns` field) - improved failover reason classification to distinguish timeouts from unknown errors - copilot SDK auth handling for env-based token exchange - queue management utility `removeFollowupRunByMessageId` added - unrelated: peon-sounds plugin extension (361 lines) **Critical issue found:** - per-model cooldown feature is incomplete - `computeNextProfileUsageStats` accepts `modelId` parameter but never sets `updatedStats.modelCooldowns[modelId]`, causing tests to fail and the feature to be non-functional <h3>Confidence Score: 1/5</h3> - critical bug blocks main feature - per-model cooldowns not implemented - the core per-model cooldown feature has a logic bug where `modelCooldowns` is never populated, making it non-functional. tests expect this behavior but the implementation is incomplete in `computeNextProfileUsageStats`. the timeout detection logic appears sound but cannot work correctly without the cooldown mechanism - `src/agents/auth-profiles/usage.ts` requires immediate fix to implement per-model cooldown logic in `computeNextProfileUsageStats` function <sub>Last reviewed commit: bca661f</sub>