← Back to PRs

#12995: feat(infra): Add retry with exponential backoff for transient failures

by trevorgordon981 open 2026-02-10 01:24 View on GitHub →
channel: slack app: web-ui gateway agents stale
## Summary Adds a configurable retry mechanism with exponential backoff for handling transient API failures. ## Features - Exponential backoff (1s → 2s → 4s → 8s default) - Handles 429, 5xx, timeouts, ECONNRESET - Excludes non-retryable errors (400, 401, 403, 404) - Attaches retryMetadata to errors for debugging ## Files - src/infra/retry-with-backoff.ts - src/infra/retry-with-backoff.test.ts (16 tests) <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> Adds a new `src/infra/retry-with-backoff.ts` helper providing retry semantics with exponential backoff and jitter for transient failures, along with a dedicated test suite (`src/infra/retry-with-backoff.test.ts`). The helper classifies retryable errors using HTTP status, network error codes, and common timeout/network message patterns, and attaches `retryMetadata` to the final thrown error for debugging. <h3>Confidence Score: 4/5</h3> - This PR is close to mergeable but has one correctness issue in backoff delay clamping that should be fixed first. - Core retry loop and retryable-error classification are straightforward and covered by tests, but `calculateBackoff()` can return delays above `maxDelayMs` due to jitter being applied after capping. Fixing that removes the main behavioral footgun. - src/infra/retry-with-backoff.ts <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs