← Back to PRs

#10034: Don't crash gateway on transient unhandled fetch failures

by gigq open 2026-02-06 01:45 View on GitHub →
stale
Context: In production we saw repeated (undici) causing the gateway process to restart and miss replies.\n\nChange: Make the global unhandledRejection handler *log and continue* by default for non-fatal, non-config errors. Keep existing exit behavior for fatal/config errors. Add an opt-in env var to restore strict crash-on-unhandled behavior when debugging.\n\nRationale: This prevents transient I/O failures (including undici fetch failures) from taking down the whole gateway. <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR changes the global `unhandledRejection` handler (`src/infra/unhandled-rejections.ts`) to **log and continue by default** for non-fatal/non-config errors, while still exiting for fatal and configuration error codes. It also adds an opt-in env var (`OPENCLAW_EXIT_ON_UNHANDLED_REJECTION`) to restore the prior “crash on unhandled” behavior for debugging. The change is wired through existing entry points (CLI/gateway) via `installUnhandledRejectionHandler()`, so runtime behavior will shift from “exit on generic unhandled rejections” to “keep running unless the error is classified fatal/config or the env var is set.” <h3>Confidence Score: 4/5</h3> - Mostly safe to merge, but will break tests as-is. - Core behavior change is small and localized, and fatal/config exits are preserved; however, the existing test suite contains at least one assertion that contradicts the new default behavior and will fail unless updated. - src/infra/unhandled-rejections.fatal-detection.test.ts <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs