#8256: feat: Add rate limit strategy configuration
agents
stale
Cluster:
Model Fallbacks and Rate Limiting
Add configurable rate limit handling that allows users to choose between:
- "switch": Immediately try fallback model (default, existing behavior)
- "wait": Parse Retry-After header and wait, then retry same model
- "ask": Prompt user to choose between wait or switch
This addresses issues where HTTP 429 errors don't trigger optimal recovery. The feature parses Retry-After headers from various AI providers (OpenAI, Anthropic) and respects configurable max wait times (default: 60s).
Configuration:
```json
{
"agents": {
"defaults": {
"rateLimitStrategy": {
"strategy": "wait",
"maxWaitSeconds": 60,
"backupModel": "openai/gpt-4"
}
}
}
}
```
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR adds a configurable rate-limit handling layer to the model fallback path. It introduces a `rate-limit-handler` module that detects 429/402 (and message-based patterns), extracts retry timing from provider-specific headers (Retry-After, OpenAI x-ratelimit-reset-*, Anthropic *-reset), and decides whether to switch models, wait and retry once, or ask the caller for a decision. Config types + zod schema + UI hints are extended under `agents.defaults.rateLimitStrategy`.
The main integration is in `runWithModelFallback`, which now optionally sleeps and retries the same provider/model once on rate limit depending on the configured strategy, otherwise proceeding through the existing candidate fallback list.
<h3>Confidence Score: 3/5</h3>
- Mostly safe to merge, but one key config knob appears non-functional and a couple of edge cases can cause surprising behavior.
- Core logic is straightforward and covered by unit tests for parsing/decision-making, but the `backupModel` path isn’t wired into candidate selection (so the feature as documented won’t work), and rate-limit detection/headers extraction may misbehave for some real error shapes.
- src/agents/model-fallback.ts, src/agents/rate-limit-handler.ts
<!-- greptile_other_comments_section -->
<sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#13658: fix: silent model failover with fallback notification
by taw0002 · 2026-02-10
81.3%
#8258: feat: Add smart model tiering for cost optimization
by revenuestack · 2026-02-03
78.3%
#9025: Fix/automatic exponential backoff for LLM rate limits
by fotorpics · 2026-02-04
78.1%
#13686: Add opt-in rate limiting and token-based budgets for external API c...
by ShresthSamyak · 2026-02-10
78.1%
#14574: fix: gentler rate-limit cooldown backoff + clear stale cooldowns on...
by JamesEBall · 2026-02-12
77.9%
#4462: fix: prevent gateway crash when all auth profiles are in cooldown
by garnetlyx · 2026-01-30
77.5%
#21049: fix(failover): treat HTTP 5xx as rate-limit for model fallback
by maximalmargin · 2026-02-19
77.3%
#9427: fix: trigger model fallback on all 4xx HTTP errors
by dbottme · 2026-02-05
76.9%
#11349: fix(agents): do not filter fallback models by models allowlist
by liuxiaopai-ai · 2026-02-07
76.9%
#8390: feat: notify user when fallback model is used (#8182)
by Glucksberg · 2026-02-04
76.7%