#19926: fix: Extract status code from error messages for empty 429 responses
agents
size: XS
Cluster:
Rate Limit Management Enhancements
### Summary
- **Problem**: Some Google Antigravity 429 responses surface as errors with messages like `429 status code (no body)`, without a proper `status/statusCode` field or reset-hint body, which can cause inconsistent rate-limit classification and cooldown tracking.
- **Why it matters**: Without reliably recognizing these as 429 rate-limit errors, profiles may not enter the standard exponential cooldown window, leading to noisy retries and a worse experience when quota is temporarily exhausted.
- **What changed**: `getStatusCode()` in `failover-error.ts` now has a regex fallback that extracts HTTP status codes from error messages (e.g., `"429 status code (no body)"`), and `calculateAuthProfileCooldownMs()` is explicitly documented as the exponential backoff path used when no reset time can be parsed.
- **What did NOT change (scope boundary)**: No changes to the actual cooldown math, backoff schedule, or auth profile store schema; no provider-specific logic or gateway APIs were touched—this is strictly better parsing + documentation.
---
## Change Type (select all)
- [x] Bug fix
- [ ] Feature
- [ ] Refactor
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [ ] Gateway / orchestration
- [ ] Skills / tool execution
- [x] Auth / tokens (auth profiles + cooldowns)
- [ ] Memory / storage
- [ ] Integrations
- [ ] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- **Closes** #19822
- **Related** None
## User-visible / Behavior Changes
- Antigravity 429 responses that are surfaced as `"429 status code (no body)"` are now consistently classified as `rate_limit` and routed into the existing exponential cooldown path.
- Cooldown behavior for rate limits remains: ~1 min → 5 min → 25 min → max 1 hour, but this is now reliably applied even when there is no reset hint in the response body.
- No config, CLI, or API surface changes.
## Security Impact (required)
- New permissions/capabilities? **No**
- Secrets/tokens handling changed? **No**
- New/changed network calls? **No**
- Command/tool execution surface changed? **No**
- Data access scope changed? **No**
If any `Yes`, explain risk + mitigation: _N/A_
## Repro + Verification
### Environment
- OS: Any (repro is provider/API-behavior driven)
- Runtime/container: Node 22+, standard OpenClaw dev environment
- Model/provider: `google-antigravity/*` models
- Integration/channel (if any): Any channel that can trigger agent runs via Antigravity
- Relevant config (redacted): Antigravity provider configured with a quota that can hit 429
### Steps
1. Configure an Antigravity profile and drive enough traffic to cause a 429 with no response body (observed as `429 status code (no body)` in logs).
2. Capture the error surfaced into the agent pipeline (e.g., via debug logs or temporary instrumentation).
3. Observe how the auth profile is marked in `auth-profiles.json` and whether the profile is treated as in cooldown.
### Expected
- The error is classified with `reason=rate_limit`, status `429`.
- `markAuthProfileFailure()` records a new failure for that profile and updates `cooldownUntil` using `calculateAuthProfileCooldownMs()` (1m/5m/25m/1h).
- Subsequent runs rotate away from this profile until the cooldown expires.
### Actual
- **Before**: When the 429 only appeared inside the message text, status extraction could miss it, and behavior depended on message parsing alone; some shapes risked being treated as generic failures instead of explicit `rate_limit`.
- **After**: The regex fallback in `getStatusCode()` reliably extracts `429` from messages like `"429 status code (no body)"`, so the profile always goes through the rate-limit cooldown path.
## Evidence
- [x] Trace/log snippets
- Example message shape targeted: `FailoverError: 429 status code (no body)`; after the change, `describeFailoverError()` reports `status: 429, reason: "rate_limit"`, and `usageStats[profileId].cooldownUntil` advances according to `calculateAuthProfileCooldownMs`.
- [ ] Failing test/log before + passing after
- [ ] Screenshot/recording
- [ ] Perf numbers (if relevant)
## Human Verification (required)
- **Verified scenarios**:
- Reasoned through `FailoverError` creation + `describeFailoverError()` + `markAuthProfileFailure()` flow with a message-only 429.
- Verified that the new regex in `getStatusCode()` correctly picks up codes from strings like `"429 status code (no body)"` and ignores non-status text.
- Confirmed that the cooldown math remains unchanged and is still driven solely by `calculateAuthProfileCooldownMs()`.
- **Edge cases checked**:
- Error objects that already have numeric/string `status/statusCode` (no behavior change).
- Non-HTTP error messages starting with non-status numbers (guarded by the `100–599` range check).
- **What you did NOT verify**:
- Live calls against production Antigravity endpoints.
- Long-running multi-profile rotation under sustained rate limits (covered by existing cooldown tests).
## Compatibility / Migration
- Backward compatible? **Yes**
- Config/env changes? **No**
- Migration needed? **No**
If yes, exact upgrade steps: _N/A_
## Failure Recovery (if this breaks)
- **How to disable/revert this change quickly**:
- Revert `getStatusCode()` and the comment-only change in `calculateAuthProfileCooldownMs()` (`src/agents/failover-error.ts`, `src/agents/auth-profiles/usage.ts`).
- **Files/config to restore**:
- `src/agents/failover-error.ts`
- `src/agents/auth-profiles/usage.ts`
- **Known bad symptoms reviewers should watch for**:
- Non-HTTP errors being misclassified as HTTP status failures due to an overly-greedy regex (e.g., a message that happens to start with a 3-digit number and the word `error`).
## Risks and Mitigations
- **Risk**: The fallback regex in `getStatusCode()` could misinterpret some non-HTTP error messages that happen to start with a 3-digit number and a generic word like `error`.
- **Mitigation**: Regex is constrained to `100–599` range and only kicks in when neither `status` nor `statusCode` are present; if this ever misclassifies a non-HTTP error, reverting is localized and low-risk.
- **Risk**: Over-reliance on exponential backoff for all body-less 429s could temporarily suppress a profile even if a provider’s quota resets sooner.
- **Mitigation**: Backoff caps at 1 hour and only applies when there is no reset hint to parse; this is strictly better than hammering the provider with immediate retries.
- **Additional risks**: None beyond the above.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR adds a regex fallback to `getStatusCode()` in `failover-error.ts` so that HTTP status codes embedded in error messages (like `"429 status code (no body)"` from Google Antigravity) are properly extracted when `status`/`statusCode` properties are absent on the error object. It also adds JSDoc documentation to `calculateAuthProfileCooldownMs()` in `usage.ts`.
- The core logic change is small and well-scoped: a message-based regex fallback in `getStatusCode()` with a 100–599 range guard
- The `usage.ts` change is documentation-only (JSDoc comment on `calculateAuthProfileCooldownMs`)
- The `error` alternation in the regex may be overly broad — it could match non-HTTP patterns like `"300 error retries"` and should be tightened to `error code` or `http error`
- Minor indentation inconsistency on the new comment (4 spaces instead of 2)
- No tests were added for the new regex fallback behavior
<h3>Confidence Score: 3/5</h3>
- Low-risk change overall, but the regex `error` alternation could cause false positive status code extraction in edge cases
- The change is small and well-intentioned, but the `error` alternation in the regex is broad enough to match non-HTTP error messages where a 3-digit number precedes the word "error". Additionally, no tests were added for the new fallback behavior, making it harder to validate edge cases. The `usage.ts` change is purely documentation and safe.
- `src/agents/failover-error.ts` — the regex fallback in `getStatusCode()` needs the `error` alternation tightened to avoid false positives
<sub>Last reviewed commit: 167f47c</sub>
<!-- greptile_other_comments_section -->
<sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>
**Context used:**
- Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))
<!-- /greptile_comment -->
Most Similar PRs
#14574: fix: gentler rate-limit cooldown backoff + clear stale cooldowns on...
by JamesEBall · 2026-02-12
76.1%
#19267: fix: derive failover reason from timedOut flag to prevent unknown c...
by austenstone · 2026-02-17
75.8%
#14824: fix: do not trigger provider cooldown on LLM request timeouts
by CyberSinister · 2026-02-12
75.8%
#22792: fix(failover): add word boundary to 429 pattern in ERROR_PATTERNS
by miloudbelarebia · 2026-02-21
75.5%
#11821: fix(auth): trigger failover on 401 status code from expired OAuth t...
by AnonO6 · 2026-02-08
75.2%
#14368: fix: skip auth profile cooldown on format errors to prevent provide...
by koatora20 · 2026-02-12
74.0%
#16684: fix:(antigravity): align Antigravity OAuth project discovery header...
by vincentkoc · 2026-02-15
73.9%
#21017: fix: treat HTTP 502/503/504 as failover-eligible (timeout reason)
by taw0002 · 2026-02-19
73.8%
#21491: fix: classify Google 503 UNAVAILABLE as transient failover [AI-assi...
by ZPTDclaw · 2026-02-20
73.3%
#16797: fix(auth-profiles): implement per-model rate limit cooldown tracking
by mulhamna · 2026-02-15
73.1%