#6781: feat(antigravity): proactive quota exhaustion detection

by mealai open 2026-02-02 01:59 View on GitHub →

agents

Cluster: Rate Limit Management Enhancements

## Summary Adds proactive quota checking for Antigravity models before making requests. This is critical because **Antigravity does not return HTTP 429 when rate limited** - instead the connection hangs indefinitely until timeout, making rate limits indistinguishable from slow models. ## Problem 1. Request made to rate-limited Antigravity model 2. Connection hangs (no response, no error) 3. After timeout, request fails 4. System cannot distinguish rate limit from slow model 5. After multiple timeouts, all profiles enter cooldown → total failure ## Solution Proactively check Antigravity quota API (`/v1internal:fetchAvailableModels`) before making requests: 1. Before attempting Antigravity request, check quota via internal API 2. If model quota exhausted (>=99%) across all profiles, skip immediately 3. Record as `rate_limit` error for proper fallback handling 4. Cache quota results for 30 seconds to minimize API calls ## Changes - Add `antigravity-quota-cache.ts` with cached quota fetching (30s TTL) - Update `model-fallback.ts` to check quota before Antigravity requests - Skip models immediately when quota >= 99% instead of waiting for timeout - Add `quotaCheckAttempted` flag to prevent false "quota exhausted" errors when no profiles are available ## Dependencies - Requires #6780 (model-level cooldown tracking) - uses `modelId` parameter in `isProfileInCooldown()` ## Test plan - [x] Verify quota is checked before Antigravity requests (code review) - [ ] Verify models are skipped when quota >= 99% (needs 99%+ quota to test) - [x] Verify quota cache prevents excessive API calls (30s TTL in code) - [x] Verify fallback to other models works when quota exhausted - [x] Verify no false "quota exhausted" errors when no profiles checked 🤖 Generated with [Claude Code](https://claude.ai/code)