#8677: fix: add retry logic to OAuth token refresh
agents
stale
Cluster:
Error Resilience and Retry Logic
## Summary
Adds retry logic with exponential backoff to the OAuth token refresh flow, preventing transient network/API failures from causing agent failures.
## Problem
When `getOAuthApiKey()` fails transiently (network blip, API timeout), the gateway immediately throws without retrying. This causes agents to fail even when the issue is temporary and a simple retry would succeed.
**Observed:**
```
08:14:05 [agents/auth-profiles] wrote refreshed credentials to claude cli file
08:14:11 [diagnostic] OAuth token refresh failed for anthropic
```
Token had 8 hours remaining - the refresh API call itself failed. Gateway restart fixed it immediately.
## Solution
Add `withRetry()` helper with exponential backoff (3 attempts: 1s, 2s, 3s delays) around the `getOAuthApiKey()` call.
## Changes
- Add `withRetry<T>()` generic helper function
- Wrap `getOAuthApiKey()` call in retry logic
- Log warnings on retry attempts for debugging
## Testing
The retry logic is straightforward and isolated. Existing tests should pass. The helper throws on final failure, preserving error semantics.
Fixes #8673
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
Adds a small `withRetry<T>()` helper in `src/agents/auth-profiles/oauth.ts` to retry OAuth API key refresh with linear backoff (1s, 2s, 3s) for up to 3 attempts. The refresh flow now wraps `getOAuthApiKey()` in this retry logic to reduce transient network/API failures from immediately failing agents, while preserving the final error behavior on exhaustion.
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge with low risk; it adds localized retry/backoff around token refresh.
- Change is confined to one file and wraps an existing network call without altering surrounding control flow; main concern is reduced observability in retry logs (stack/structured error details).
- src/agents/auth-profiles/oauth.ts
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#16239: fix: retry on transient API errors (overloaded, rate-limit, timeout)
by zerone0x · 2026-02-14
81.7%
#17001: fix: retry sub-agent announcements with backoff instead of silently...
by luisecab · 2026-02-15
77.5%
#8225: feat(auth): add forceRefresh option and invalidateOAuthToken for 40...
by arodundef · 2026-02-03
77.5%
#2123: fix(auth): sync from Claude CLI keychain before OAuth refresh
by jorge123255 · 2026-01-26
77.1%
#16913: fix(agent): increase transient HTTP retry from 1 to 3 with escalati...
by hou-rong · 2026-02-15
77.0%
#9025: Fix/automatic exponential backoff for LLM rate limits
by fotorpics · 2026-02-04
76.2%
#9232: Fix: Add automatic retry for network errors in message runs
by vishaltandale00 · 2026-02-05
75.8%
#11472: fix: retry media fetch on transient network errors
by openclaw-quenio · 2026-02-07
75.7%
#23497: feat(retry): add retryHttpAsync utility with comprehensive coverage
by thinstripe · 2026-02-22
75.7%
#5027: fix(auth): use correct OAuth credentials for google-gemini-cli refresh
by shayan919293 · 2026-01-30
75.6%