#12995: feat(infra): Add retry with exponential backoff for transient failures
channel: slack
app: web-ui
gateway
agents
stale
Cluster:
Error Resilience and Retry Logic
## Summary
Adds a configurable retry mechanism with exponential backoff for handling transient API failures.
## Features
- Exponential backoff (1s → 2s → 4s → 8s default)
- Handles 429, 5xx, timeouts, ECONNRESET
- Excludes non-retryable errors (400, 401, 403, 404)
- Attaches retryMetadata to errors for debugging
## Files
- src/infra/retry-with-backoff.ts
- src/infra/retry-with-backoff.test.ts (16 tests)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
Adds a new `src/infra/retry-with-backoff.ts` helper providing retry semantics with exponential backoff and jitter for transient failures, along with a dedicated test suite (`src/infra/retry-with-backoff.test.ts`). The helper classifies retryable errors using HTTP status, network error codes, and common timeout/network message patterns, and attaches `retryMetadata` to the final thrown error for debugging.
<h3>Confidence Score: 4/5</h3>
- This PR is close to mergeable but has one correctness issue in backoff delay clamping that should be fixed first.
- Core retry loop and retryable-error classification are straightforward and covered by tests, but `calculateBackoff()` can return delays above `maxDelayMs` due to jitter being applied after capping. Fixing that removes the main behavioral footgun.
- src/infra/retry-with-backoff.ts
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#16195: feat(infra): add unified retry utility with exponential backoff
by bianbiandashen · 2026-02-14
81.7%
#23152: feat(plugin): add retry-backoff extension
by cintia09 · 2026-02-22
80.3%
#10551: feat(infra): add error classification for smarter retry decisions
by DukeDeSouth · 2026-02-06
79.0%
#10276: fix(infra): use bidirectional jitter in computeBackoff
by programming-pupil · 2026-02-06
78.9%
#4086: Test/add backoff tests
by TechWizard9999 · 2026-01-29
78.7%
#19540: feat: add timeout and exponential backoff retry for frontend API calls
by Mozzzaic · 2026-02-17
78.2%
#16239: fix: retry on transient API errors (overloaded, rate-limit, timeout)
by zerone0x · 2026-02-14
77.3%
#15585: fix: add retry/backoff for Gemini embedding API calls
by WalterSumbon · 2026-02-13
76.4%
#23497: feat(retry): add retryHttpAsync utility with comprehensive coverage
by thinstripe · 2026-02-22
76.0%
#16913: fix(agent): increase transient HTTP retry from 1 to 3 with escalati...
by hou-rong · 2026-02-15
75.7%