← Back to PRs

#11472: fix: retry media fetch on transient network errors

by openclaw-quenio open 2026-02-07 21:24 View on GitHub →
stale
## Summary Adds exponential backoff retry to `fetchRemoteMedia()` in `src/media/fetch.ts` for transient network failures. ## Problem When fetching media from provider APIs (Telegram, Discord, Slack, etc.), a single transient `TypeError: fetch failed` causes the entire inbound message to be dropped. The agent never sees the message, and there is no re-delivery mechanism. This is especially common in VM/container environments where network connectivity to provider APIs can be intermittent. ## Fix - Retry up to **3 times** with exponential backoff (**1s → 2s → 4s**) - Only retries on network-level fetch failures (the `catch` block) - **Does not retry** deterministic errors: HTTP status errors (`http_error`) or size limit violations (`max_bytes`) - Logs each retry attempt for observability - Properly cleans up `release` handle between retries ## Testing Verified locally by: 1. Observing a `MediaFetchError: fetch_failed` dropping a Telegram photo message (see logs in #11471) 2. Applying the patch to `dist/deliver-BIDW_mg2.js` 3. Restarting the gateway 4. Successfully receiving the same photo on retry Fixes #11471 <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adds retry-with-exponential-backoff around `fetchWithSsrFGuard()` in `src/media/fetch.ts` to reduce dropped inbound messages when media fetch fails due to transient network issues. The retry loop logs each backoff attempt and keeps existing behavior for HTTP status errors and max-bytes enforcement in the response handling path. Key things to double-check before merge: - The retry loop currently performs one more attempt than `MEDIA_FETCH_MAX_RETRIES` suggests (off-by-one). - The retry is applied to any error thrown by `fetchWithSsrFGuard`, including deterministic SSRF/URL/redirect errors, which adds delay/noise and diverges from the stated goal of retrying only transient fetch failures. <h3>Confidence Score: 3/5</h3> - Mergeable after fixing retry semantics and error filtering - Change is localized and the intent is clear, but current loop bounds add an extra attempt and the retry catches deterministic errors from fetchWithSsrFGuard (SSRF/URL/redirect validation), causing unnecessary delays and noisy logs in those scenarios. - src/media/fetch.ts <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs