#6781: feat(antigravity): proactive quota exhaustion detection
agents
Cluster:
Rate Limit Management Enhancements
## Summary
Adds proactive quota checking for Antigravity models before making requests. This is critical because **Antigravity does not return HTTP 429 when rate limited** - instead the connection hangs indefinitely until timeout, making rate limits indistinguishable from slow models.
## Problem
1. Request made to rate-limited Antigravity model
2. Connection hangs (no response, no error)
3. After timeout, request fails
4. System cannot distinguish rate limit from slow model
5. After multiple timeouts, all profiles enter cooldown → total failure
## Solution
Proactively check Antigravity quota API (`/v1internal:fetchAvailableModels`) before making requests:
1. Before attempting Antigravity request, check quota via internal API
2. If model quota exhausted (>=99%) across all profiles, skip immediately
3. Record as `rate_limit` error for proper fallback handling
4. Cache quota results for 30 seconds to minimize API calls
## Changes
- Add `antigravity-quota-cache.ts` with cached quota fetching (30s TTL)
- Update `model-fallback.ts` to check quota before Antigravity requests
- Skip models immediately when quota >= 99% instead of waiting for timeout
- Add `quotaCheckAttempted` flag to prevent false "quota exhausted" errors when no profiles are available
## Dependencies
- Requires #6780 (model-level cooldown tracking) - uses `modelId` parameter in `isProfileInCooldown()`
## Test plan
- [x] Verify quota is checked before Antigravity requests (code review)
- [ ] Verify models are skipped when quota >= 99% (needs 99%+ quota to test)
- [x] Verify quota cache prevents excessive API calls (30s TTL in code)
- [x] Verify fallback to other models works when quota exhausted
- [x] Verify no false "quota exhausted" errors when no profiles checked
🤖 Generated with [Claude Code](https://claude.ai/code)
Most Similar PRs
#16797: fix(auth-profiles): implement per-model rate limit cooldown tracking
by mulhamna · 2026-02-15
68.0%
#7941: fix: scope rate-limit cooldowns per-model instead of per-provider
by adrrr · 2026-02-03
66.8%
#13077: fix: prevent cooldown pollution across different models on the same...
by magendary · 2026-02-10
65.7%
#20388: fix(failover): don't skip same-provider fallback models when cooldo...
by Limitless2023 · 2026-02-18
64.1%
#14574: fix: gentler rate-limit cooldown backoff + clear stale cooldowns on...
by JamesEBall · 2026-02-12
63.8%
#14824: fix: do not trigger provider cooldown on LLM request timeouts
by CyberSinister · 2026-02-12
63.4%
#18902: fix: exempt format errors from auth profile cooldown
by tag-assistant · 2026-02-17
63.2%
#23816: fix(agents): model fallback skipped during session overrides and pr...
by ramezgaberiel · 2026-02-22
62.6%
#20428: feat: capture Anthropic rate-limit response headers to disk
by AndrewArto · 2026-02-18
62.1%
#20911: fix: auto-reset session when all models time out
by tag-assistant · 2026-02-19
61.6%