#10034: Don't crash gateway on transient unhandled fetch failures
stale
Cluster:
Gateway Error Handling Improvements
Context: In production we saw repeated (undici) causing the gateway process to restart and miss replies.\n\nChange: Make the global unhandledRejection handler *log and continue* by default for non-fatal, non-config errors. Keep existing exit behavior for fatal/config errors. Add an opt-in env var to restore strict crash-on-unhandled behavior when debugging.\n\nRationale: This prevents transient I/O failures (including undici fetch failures) from taking down the whole gateway.
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR changes the global `unhandledRejection` handler (`src/infra/unhandled-rejections.ts`) to **log and continue by default** for non-fatal/non-config errors, while still exiting for fatal and configuration error codes. It also adds an opt-in env var (`OPENCLAW_EXIT_ON_UNHANDLED_REJECTION`) to restore the prior “crash on unhandled” behavior for debugging.
The change is wired through existing entry points (CLI/gateway) via `installUnhandledRejectionHandler()`, so runtime behavior will shift from “exit on generic unhandled rejections” to “keep running unless the error is classified fatal/config or the env var is set.”
<h3>Confidence Score: 4/5</h3>
- Mostly safe to merge, but will break tests as-is.
- Core behavior change is small and localized, and fatal/config exits are preserved; however, the existing test suite contains at least one assertion that contradicts the new default behavior and will fail unless updated.
- src/infra/unhandled-rejections.fatal-detection.test.ts
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#11101: fix: handle AbortError and WebSocket 1006 in unhandled rejection ha...
by Nipurn123 · 2026-02-07
86.8%
#3396: Config: gateway.unhandledRejections (warn|exit)
by diegoaledesma · 2026-01-28
84.6%
#4653: fix(gateway): improve crash resilience for mDNS and network errors
by AyedAlmudarra · 2026-01-30
82.1%
#5823: fix(config): exit cleanly on invalid config instead of high CPU loop
by gavinbmoore · 2026-02-01
81.8%
#12656: fix: install unhandled rejection handler before async boot operations
by kiranirabatti · 2026-02-09
81.0%
#7563: fix: expand transient network error detection
by kaigritun · 2026-02-03
80.9%
#7558: fix: Handle Grammy/Telegram network errors to prevent gateway crashes
by kaigritun · 2026-02-03
80.0%
#4462: fix: prevent gateway crash when all auth profiles are in cooldown
by garnetlyx · 2026-01-30
80.0%
#3831: fix: ignore mDNS socket errors to prevent gateway crashes
by cici1029 · 2026-01-29
79.1%
#12234: gateway: incident tracking, recover command, and ciao ERR_SERVER_CL...
by levineam · 2026-02-09
78.8%