#13686: Add opt-in rate limiting and token-based budgets for external API calls (#13615)

by ShresthSamyak open 2026-02-10 21:28 View on GitHub →

cli stale

Cluster: Model Fallbacks and Rate Limiting

### Summary Adds opt-in rate limiting, retry backoff, and token-based budget enforcement for external API calls (LLM providers, web search, etc.). Fixes #13615. ### What’s included - Rate limiting scoped per provider and model - Automatic retries with exponential backoff and jitter on 429 responses - Token-based daily/monthly budgets with optional hard blocking - Rate limit slot rollback on failed calls - CLI support to inspect current limits and usage - Structured logs for rate limit and budget events ### Verification - Provider `usage` data is preserved end-to-end - Token budgets are enforced before execution on subsequent calls - Token usage is accumulated across retry attempts - Failed calls correctly release rate limit slots - Different models do not share limiter buckets ### Testing - `npm test` / `npx vitest run` — no failing paths identified - `npm run lint` - `npm run build` ### Scope notes - Rate limiting is disabled by default and must be explicitly enabled - Budgets are token-based to avoid pricing assumptions - Per-endpoint limits, Redis-backed persistence, and OTel metrics were intentionally deferred to keep this PR focused and reviewable  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> Comprehensive rate limiting implementation with sliding-window limiters, token budgets, and exponential backoff retries. The architecture cleanly separates concerns: `SlidingWindowLimiter` handles RPM/TPM/RPD windows, `BudgetTracker` manages daily/monthly token budgets with threshold warnings, `RateLimitQueue` provides FIFO queueing for rate-limited requests, and `RateLimitedRunner` orchestrates the complete flow. The implementation integrates into the model-fallback system via a singleton wrapper pattern. **Key features:** - Per-provider and per-model rate limiting scopes - Automatic 429 retry with exponential backoff and jitter - Token usage accumulation across retry attempts - Slot rollback on failures to prevent double-counting - File-based budget persistence for state across restarts - CLI commands for status inspection and configuration - Structured logging for rate limit events **Issues found:** - Critical logic error: `resolveRateLimitsConfig` defaults `enabled` to `true`, contradicting the PR description stating "Rate limiting is disabled by default and must be explicitly enabled" - Minor syntax issue: redundant nested `if (usage)` check in `provider-wrapper.ts:185-189` The test coverage is thorough with unit tests for all core components (limiter, budget, queue, config, and retry integration). The sliding window algorithm is well-implemented with proper handling of window rotation and sliding estimates. <h3>Confidence Score: 3/5</h3> - This PR introduces significant new functionality but contains a critical logic error that contradicts stated requirements - The implementation is well-architected with thorough test coverage and clean separation of concerns. However, the critical default value bug in `config.ts` means rate limiting would be enabled by default instead of opt-in as specified in the PR description and scope notes. This needs to be fixed before merging. The redundant if-check is minor. Once the default is corrected, this would be a solid 4. - Pay close attention to `src/rate-limits/config.ts` (default enabled value)  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>