#13686: Add opt-in rate limiting and token-based budgets for external API calls (#13615)
cli
stale
Cluster:
Model Fallbacks and Rate Limiting
### Summary
Adds opt-in rate limiting, retry backoff, and token-based budget enforcement
for external API calls (LLM providers, web search, etc.).
Fixes #13615.
### What’s included
- Rate limiting scoped per provider and model
- Automatic retries with exponential backoff and jitter on 429 responses
- Token-based daily/monthly budgets with optional hard blocking
- Rate limit slot rollback on failed calls
- CLI support to inspect current limits and usage
- Structured logs for rate limit and budget events
### Verification
- Provider `usage` data is preserved end-to-end
- Token budgets are enforced before execution on subsequent calls
- Token usage is accumulated across retry attempts
- Failed calls correctly release rate limit slots
- Different models do not share limiter buckets
### Testing
- `npm test` / `npx vitest run` — no failing paths identified
- `npm run lint`
- `npm run build`
### Scope notes
- Rate limiting is disabled by default and must be explicitly enabled
- Budgets are token-based to avoid pricing assumptions
- Per-endpoint limits, Redis-backed persistence, and OTel metrics were intentionally deferred to keep this PR focused and reviewable
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
Comprehensive rate limiting implementation with sliding-window limiters, token budgets, and exponential backoff retries. The architecture cleanly separates concerns: `SlidingWindowLimiter` handles RPM/TPM/RPD windows, `BudgetTracker` manages daily/monthly token budgets with threshold warnings, `RateLimitQueue` provides FIFO queueing for rate-limited requests, and `RateLimitedRunner` orchestrates the complete flow. The implementation integrates into the model-fallback system via a singleton wrapper pattern.
**Key features:**
- Per-provider and per-model rate limiting scopes
- Automatic 429 retry with exponential backoff and jitter
- Token usage accumulation across retry attempts
- Slot rollback on failures to prevent double-counting
- File-based budget persistence for state across restarts
- CLI commands for status inspection and configuration
- Structured logging for rate limit events
**Issues found:**
- Critical logic error: `resolveRateLimitsConfig` defaults `enabled` to `true`, contradicting the PR description stating "Rate limiting is disabled by default and must be explicitly enabled"
- Minor syntax issue: redundant nested `if (usage)` check in `provider-wrapper.ts:185-189`
The test coverage is thorough with unit tests for all core components (limiter, budget, queue, config, and retry integration). The sliding window algorithm is well-implemented with proper handling of window rotation and sliding estimates.
<h3>Confidence Score: 3/5</h3>
- This PR introduces significant new functionality but contains a critical logic error that contradicts stated requirements
- The implementation is well-architected with thorough test coverage and clean separation of concerns. However, the critical default value bug in `config.ts` means rate limiting would be enabled by default instead of opt-in as specified in the PR description and scope notes. This needs to be fixed before merging. The redundant if-check is minor. Once the default is corrected, this would be a solid 4.
- Pay close attention to `src/rate-limits/config.ts` (default enabled value)
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#8256: feat: Add rate limit strategy configuration
by revenuestack · 2026-02-03
78.1%
#14574: fix: gentler rate-limit cooldown backoff + clear stale cooldowns on...
by JamesEBall · 2026-02-12
78.1%
#9025: Fix/automatic exponential backoff for LLM rate limits
by fotorpics · 2026-02-04
77.0%
#16963: fix: enable auth rate limiting by default
by StressTestor · 2026-02-15
74.7%
#16797: fix(auth-profiles): implement per-model rate limit cooldown tracking
by mulhamna · 2026-02-15
74.5%
#11874: fix: handle fetch rejections in provider usage withTimeout
by Zjianru · 2026-02-08
73.9%
#14824: fix: do not trigger provider cooldown on LLM request timeouts
by CyberSinister · 2026-02-12
73.7%
#9173: Fix: Improve error messaging for API rate limits and billing errors
by vishaltandale00 · 2026-02-04
73.7%
#19515: security: add per-connection WebSocket rate limiting
by Mozzzaic · 2026-02-17
73.1%
#14054: #13923 ----[Feature] -- Provider-rate-limit-/-quota-query-tool
by harshmohite04 · 2026-02-11
72.9%