#19926: fix: Extract status code from error messages for empty 429 responses

by gaurav10gg open 2026-02-18 09:30 View on GitHub →

agents size: XS

Cluster: Rate Limit Management Enhancements

### Summary - **Problem**: Some Google Antigravity 429 responses surface as errors with messages like `429 status code (no body)`, without a proper `status/statusCode` field or reset-hint body, which can cause inconsistent rate-limit classification and cooldown tracking. - **Why it matters**: Without reliably recognizing these as 429 rate-limit errors, profiles may not enter the standard exponential cooldown window, leading to noisy retries and a worse experience when quota is temporarily exhausted. - **What changed**: `getStatusCode()` in `failover-error.ts` now has a regex fallback that extracts HTTP status codes from error messages (e.g., `"429 status code (no body)"`), and `calculateAuthProfileCooldownMs()` is explicitly documented as the exponential backoff path used when no reset time can be parsed. - **What did NOT change (scope boundary)**: No changes to the actual cooldown math, backoff schedule, or auth profile store schema; no provider-specific logic or gateway APIs were touched—this is strictly better parsing + documentation. --- ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [x] Auth / tokens (auth profiles + cooldowns) - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - **Closes** #19822 - **Related** None ## User-visible / Behavior Changes - Antigravity 429 responses that are surfaced as `"429 status code (no body)"` are now consistently classified as `rate_limit` and routed into the existing exponential cooldown path. - Cooldown behavior for rate limits remains: ~1 min → 5 min → 25 min → max 1 hour, but this is now reliably applied even when there is no reset hint in the response body. - No config, CLI, or API surface changes. ## Security Impact (required) - New permissions/capabilities? **No** - Secrets/tokens handling changed? **No** - New/changed network calls? **No** - Command/tool execution surface changed? **No** - Data access scope changed? **No** If any `Yes`, explain risk + mitigation: _N/A_ ## Repro + Verification ### Environment - OS: Any (repro is provider/API-behavior driven) - Runtime/container: Node 22+, standard OpenClaw dev environment - Model/provider: `google-antigravity/*` models - Integration/channel (if any): Any channel that can trigger agent runs via Antigravity - Relevant config (redacted): Antigravity provider configured with a quota that can hit 429 ### Steps 1. Configure an Antigravity profile and drive enough traffic to cause a 429 with no response body (observed as `429 status code (no body)` in logs). 2. Capture the error surfaced into the agent pipeline (e.g., via debug logs or temporary instrumentation). 3. Observe how the auth profile is marked in `auth-profiles.json` and whether the profile is treated as in cooldown. ### Expected - The error is classified with `reason=rate_limit`, status `429`. - `markAuthProfileFailure()` records a new failure for that profile and updates `cooldownUntil` using `calculateAuthProfileCooldownMs()` (1m/5m/25m/1h). - Subsequent runs rotate away from this profile until the cooldown expires. ### Actual - **Before**: When the 429 only appeared inside the message text, status extraction could miss it, and behavior depended on message parsing alone; some shapes risked being treated as generic failures instead of explicit `rate_limit`. - **After**: The regex fallback in `getStatusCode()` reliably extracts `429` from messages like `"429 status code (no body)"`, so the profile always goes through the rate-limit cooldown path. ## Evidence - [x] Trace/log snippets - Example message shape targeted: `FailoverError: 429 status code (no body)`; after the change, `describeFailoverError()` reports `status: 429, reason: "rate_limit"`, and `usageStats[profileId].cooldownUntil` advances according to `calculateAuthProfileCooldownMs`. - [ ] Failing test/log before + passing after - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) ## Human Verification (required) - **Verified scenarios**: - Reasoned through `FailoverError` creation + `describeFailoverError()` + `markAuthProfileFailure()` flow with a message-only 429. - Verified that the new regex in `getStatusCode()` correctly picks up codes from strings like `"429 status code (no body)"` and ignores non-status text. - Confirmed that the cooldown math remains unchanged and is still driven solely by `calculateAuthProfileCooldownMs()`. - **Edge cases checked**: - Error objects that already have numeric/string `status/statusCode` (no behavior change). - Non-HTTP error messages starting with non-status numbers (guarded by the `100–599` range check). - **What you did NOT verify**: - Live calls against production Antigravity endpoints. - Long-running multi-profile rotation under sustained rate limits (covered by existing cooldown tests). ## Compatibility / Migration - Backward compatible? **Yes** - Config/env changes? **No** - Migration needed? **No** If yes, exact upgrade steps: _N/A_ ## Failure Recovery (if this breaks) - **How to disable/revert this change quickly**: - Revert `getStatusCode()` and the comment-only change in `calculateAuthProfileCooldownMs()` (`src/agents/failover-error.ts`, `src/agents/auth-profiles/usage.ts`). - **Files/config to restore**: - `src/agents/failover-error.ts` - `src/agents/auth-profiles/usage.ts` - **Known bad symptoms reviewers should watch for**: - Non-HTTP errors being misclassified as HTTP status failures due to an overly-greedy regex (e.g., a message that happens to start with a 3-digit number and the word `error`). ## Risks and Mitigations - **Risk**: The fallback regex in `getStatusCode()` could misinterpret some non-HTTP error messages that happen to start with a 3-digit number and a generic word like `error`. - **Mitigation**: Regex is constrained to `100–599` range and only kicks in when neither `status` nor `statusCode` are present; if this ever misclassifies a non-HTTP error, reverting is localized and low-risk. - **Risk**: Over-reliance on exponential backoff for all body-less 429s could temporarily suppress a profile even if a provider’s quota resets sooner. - **Mitigation**: Backoff caps at 1 hour and only applies when there is no reset hint to parse; this is strictly better than hammering the provider with immediate retries. - **Additional risks**: None beyond the above.  <h3>Greptile Summary</h3> This PR adds a regex fallback to `getStatusCode()` in `failover-error.ts` so that HTTP status codes embedded in error messages (like `"429 status code (no body)"` from Google Antigravity) are properly extracted when `status`/`statusCode` properties are absent on the error object. It also adds JSDoc documentation to `calculateAuthProfileCooldownMs()` in `usage.ts`. - The core logic change is small and well-scoped: a message-based regex fallback in `getStatusCode()` with a 100–599 range guard - The `usage.ts` change is documentation-only (JSDoc comment on `calculateAuthProfileCooldownMs`) - The `error` alternation in the regex may be overly broad — it could match non-HTTP patterns like `"300 error retries"` and should be tightened to `error code` or `http error` - Minor indentation inconsistency on the new comment (4 spaces instead of 2) - No tests were added for the new regex fallback behavior <h3>Confidence Score: 3/5</h3> - Low-risk change overall, but the regex `error` alternation could cause false positive status code extraction in edge cases - The change is small and well-intentioned, but the `error` alternation in the regex is broad enough to match non-HTTP error messages where a 3-digit number precedes the word "error". Additionally, no tests were added for the new fallback behavior, making it harder to validate edge cases. The `usage.ts` change is purely documentation and safe. - `src/agents/failover-error.ts` — the regex fallback in `getStatusCode()` needs the `error` alternation tightened to avoid false positives <sub>Last reviewed commit: 167f47c</sub>  <sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub> **Context used:** - Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))