← Back to PRs

#10551: feat(infra): add error classification for smarter retry decisions

by DukeDeSouth open 2026-02-06 17:13 View on GitHub →
stale size: M
## Human View ### Summary Currently `retryAsync` relies on per-channel `shouldRetry` callbacks with ad-hoc regex matching (e.g. `TELEGRAM_RETRY_RE`). This works, but means: - Every new channel re-invents the same classification logic - Auth errors (401/403) and billing errors (402) still get retried — wasting time and API credits - No centralized place to understand *why* a retry was skipped This PR adds `src/infra/error-classifier.ts` — a single `classifyError()` function that inspects: 1. **HTTP status codes** — 429 → rate_limit, 401/403 → auth, 402 → billing, 5xx → retryable 2. **Node.js network codes** — ECONNRESET, ETIMEDOUT → retryable; ENOTFOUND → fatal 3. **Provider message patterns** — OpenAI quota, Anthropic overloaded, generic timeouts Six categories: `retryable`, `rate_limit`, `auth`, `billing`, `fatal`, `unknown`. #### Integration Two drop-in helpers for existing `retryAsync`: ```ts import { isRetryableError, retryAfterMs } from "./error-classifier.js"; retryAsync(fn, { shouldRetry: isRetryableError, retryAfterMs, }); ``` #### What this does NOT change - No modifications to existing `retry.ts` or `retry-policy.ts` - No breaking changes - Purely additive — new file + tests ### Test plan - [x] 30 vitest tests in `error-classifier.test.ts` - [x] HTTP status codes (429, 401, 402, 403, 400, 404, 500, 502, 503, 501) - [x] Network error codes (ECONNRESET, ETIMEDOUT, ECONNREFUSED, ENOTFOUND, CERT_HAS_EXPIRED) - [x] Provider message patterns (OpenAI quota, rate limit, Anthropic overloaded, timeout, socket hang up) - [x] Edge cases (null, undefined, string errors, Error instances, nested response.status) - [x] Priority: HTTP status > error code > message pattern --- ## AI View (DCCE Protocol v1.0) ### Metadata - **Generator**: Claude (Anthropic) via Cursor IDE - **Methodology**: AI-assisted development with human oversight and review ### AI Contribution Summary - Solution design and implementation - Test development (30 test cases) ### Verification Steps Performed 1. Analyzed existing codebase patterns 2. Implemented feature with comprehensive tests 3. Ran test suite (30 tests passing) ### Human Review Guidance - Core changes are in: `src/infra/error-classifier.ts`, `retry.ts`, `retry-policy.ts` - Verify test coverage matches the described scenarios Made with M7 [Cursor](https://cursor.com) <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> - Adds a new `src/infra/error-classifier.ts` module that classifies arbitrary thrown values into retry categories (HTTP status, network code, message patterns) and provides `shouldRetry`/`retryAfterMs` helpers for `retryAsync()`. - Adds a vitest suite covering status-code classification, common Node/network error codes, provider message pattern matching, and a few precedence/edge cases. - Intended to centralize retry decision logic so callers can avoid ad-hoc regexes and skip retries for auth/billing errors. <h3>Confidence Score: 4/5</h3> - Mostly safe to merge, but contains a small logical defect in status classification code ordering. - The PR is additive with good test coverage; the main issue found is an unreachable `501` special-case in `classifyStatus()`, which indicates intended behavior/reason text won’t ever apply as written. - src/infra/error-classifier.ts <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs