#13820: feat(agents): retry empty-stream once before fallback
agents
size: M
This PR improves resilience for transient empty-stream failures (`request ended without sending any chunks`) by retrying once on the same model before proceeding through fallback models.
Changes:
- classify empty-stream patterns as timeout failover reasons
- add one-time in-model retry (300-800ms jitter, feature-flag controlled)
- keep robust fallback behavior for remaining retryable errors
- add unit tests for classification, retry success, retry->fallback, and feature-flag off path
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR enhanced resilience for transient empty-stream failures by implementing a one-time in-model retry before proceeding to fallback models. The implementation correctly classifies empty-stream errors (`request ended without sending any chunks`, `stream ended before first chunk`) as timeout failover reasons and adds retry logic with configurable jitter (300-800ms default, feature-flag controlled via `OPENCLAW_EMPTY_STREAM_RETRY`).
Key changes:
- Added empty-stream error patterns to timeout classification in error pattern matching
- Implemented one-time in-model retry with exponential jitter before fallback
- Added comprehensive test coverage for retry success, retry-then-fallback, and feature-flag disable scenarios
- Maintained backward compatibility through feature flags and existing fallback behavior
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk
- The implementation is well-designed with proper error classification, comprehensive test coverage covering all code paths (retry success, retry failure fallback, feature-flag disable), and backward-compatible feature flags. The retry logic is correctly isolated with a guard flag preventing infinite loops, and the delay mechanism uses appropriate jitter to prevent thundering herd issues.
- No files require special attention
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#15815: Fallback LLM doesn't trigger if primary model is local
by shihanqu · 2026-02-13
77.2%
#13658: fix: silent model failover with fallback notification
by taw0002 · 2026-02-10
76.8%
#12687: fix: handle empty LLM stream response with failover
by janckerchen · 2026-02-09
76.4%
#12314: fix: treat HTTP 5xx server errors as failover-worthy
by hsssgdtc · 2026-02-09
76.2%
#8256: feat: Add rate limit strategy configuration
by revenuestack · 2026-02-03
76.1%
#4462: fix: prevent gateway crash when all auth profiles are in cooldown
by garnetlyx · 2026-01-30
75.7%
#11349: fix(agents): do not filter fallback models by models allowlist
by liuxiaopai-ai · 2026-02-07
75.4%
#9427: fix: trigger model fallback on all 4xx HTTP errors
by dbottme · 2026-02-05
75.2%
#10178: fix: trigger fallback when model resolution fails with unknown model
by Yida-Dev · 2026-02-06
74.7%
#8390: feat: notify user when fallback model is used (#8182)
by Glucksberg · 2026-02-04
74.7%