← Back to PRs

#19252: fix(agents): continue model fallback on failover text payloads

by mahsumaktas open 2026-02-17 15:33 View on GitHub →
commands agents size: M
## Summary - detect failover-shaped error payloads returned as successful run results in `runWithModelFallback` - convert those payload-only failures into fallback retries so the chain advances instead of stopping on OpenRouter-style `402` text - keep guardrails to avoid false positives for normal instructional text mentioning rate limits ## Testing - `pnpm vitest run --config vitest.e2e.config.ts src/agents/model-fallback.e2e.test.ts` - `pnpm oxlint --type-aware src/agents/model-fallback.ts src/agents/model-fallback.e2e.test.ts` <!-- greptile_comment --> <h3>Greptile Summary</h3> Extends model fallback system to detect and retry when providers return failover-shaped error payloads as "successful" run results. The PR adds: - **Payload-level failover detection** (`resolveFailoverPayloadMessage`) in `model-fallback.ts:89-134` that inspects successful run results for error text payloads and converts them to fallback retries - **New billing error patterns** (`requires more credits`, `can only afford`) to catch OpenRouter-style 402 messages - **Context parameter** (`ModelFallbackRunContext`) passed to all run callbacks, enabling callers to know when fallback chains are active - **`probePrimaryDuringCooldown` configuration** set to `"always"` across auto-reply, followup, memory, and CLI flows so primary models are always attempted first (then fallback if rate-limited) - **Cron agent model merge fix** preserving default `fallbacks` when agent configs only override `primary` - **User-facing fallback notices** shown when billing/rate-limit causes model switching The detection logic guards against false positives by requiring error-like signals (payload marked `isError`, stopReason `"error"`, or regex match for HTTP codes/error keywords) before treating instructional text about rate limits as actual failures. <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk - The implementation is well-tested with comprehensive e2e tests covering both positive cases (detecting real failover payloads) and negative cases (not treating instructional text as errors). The detection logic includes multiple safeguards against false positives, all integration points are updated consistently, and the cron model merge fix has dedicated unit tests. The changes follow established patterns in the codebase. - No files require special attention <sub>Last reviewed commit: 29d6606</sub> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs