#19463: fix: suppress undici TLS setSession crash instead of exiting
cli
size: M
Cluster:
Gateway Error Handling Improvements
## Summary
Suppresses the known undici TLS `setSession` null pointer crash (`TypeError: Cannot read properties of null (reading 'setSession')`) instead of killing the gateway process. The error is logged as a warning and the process continues normally.
## Problem
The gateway crashes when undici attempts TLS session resumption on a socket whose internal `_handle` has already been destroyed. This is a race condition in undici's HTTP/1.1 connection pool: when a TLS socket closes, undici immediately tries to reconnect via `_resume()`, calling `tls.connect()` with a cached session — but `this._handle` is already null.
This crash:
- Kills the gateway process (restart takes 2-5s depending on orchestrator)
- Drops all in-flight requests silently
- Disconnects IM channels (Telegram, Discord, etc.)
- Can trigger crash loops under heavy HTTPS traffic
Tracked in nodejs/undici#3869. Affects Node 22+ with undici 7.x.
## Why it's safe to suppress
- The socket was already closing — no in-flight data is lost
- undici creates a fresh connection on the next request automatically
- No application state is corrupted
- The error is purely in the connection lifecycle, not in request/response handling
## Changes
### `src/infra/unhandled-rejections.ts`
- **`isUndiciTlsSessionBug(err)`** — Narrowly detects this specific crash by checking:
1. Error is a `TypeError`
2. Message contains `reading 'setSession'`
3. Stack trace contains both `TLSSocket.setSession` AND `undici`
All three conditions required to avoid false positives.
- **`installUncaughtExceptionHandler()`** — Centralized handler that suppresses the TLS bug with `console.warn` while preserving `process.exit(1)` for all other uncaught exceptions.
### Entry points (`src/index.ts`, `src/cli/run-main.ts`, `src/macos/relay.ts`)
- Replaced inline `process.on("uncaughtException", ...)` handlers with the centralized `installUncaughtExceptionHandler()`
- No behavioral change for non-TLS errors
### Tests
- **`uncaught-exception.test.ts`** — 7 unit tests for `isUndiciTlsSessionBug()` covering exact match, path variations, wrong error types, wrong messages, non-undici stacks, null inputs, and missing stacks
- **`uncaught-exception.handler.test.ts`** — 3 integration tests verifying the handler suppresses the TLS bug without exiting, still exits on unknown exceptions, and still exits on unrelated TypeErrors
All 10 new tests + 10 existing tests pass.
## AI Disclosure
- [x] AI-assisted (built with OpenClaw/Claude)
- [x] Fully tested (unit + integration, all passing)
Fixes #16206
Fixes #19168
Ref #16335
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR fixes a real, well-documented Node 22 + undici 7.x race condition (`TypeError: Cannot read properties of null (reading 'setSession')`) that was killing the gateway process. The approach — narrowly detecting the specific bug by requiring all three of: `TypeError`, the exact `reading 'setSession'` message substring, and both `TLSSocket.setSession` and `undici` in the stack — is conservative and unlikely to suppress unrelated errors. The refactoring to a centralized `installUncaughtExceptionHandler()` is clean and consistent with the existing `installUnhandledRejectionHandler()` pattern.
Key observations:
- **Duplicate registration risk**: `installUncaughtExceptionHandler` uses `process.on(...)` with no idempotency guard. If called more than once (which can occur in tests or if entry points change), duplicate listeners accumulate, causing double-logging and multiple `process.exit(1)` calls on a single non-TLS exception. The existing `installUnhandledRejectionHandler` has the same design, but this PR compounds the risk by adding a second unguarded installer.
- **Test teardown gap**: `uncaught-exception.handler.test.ts` installs the handler in `beforeAll` with no corresponding `afterAll` cleanup to `removeListener`. Combined with the existing `unhandled-rejections.fatal-detection.test.ts` doing the same for the rejection handler, both listeners remain live for the entire Vitest worker. If another test file later emits `uncaughtException`, the (now restored real) `process.exit` inside the handler would actually kill the test worker.
- **Detection coverage is appropriate**: The three-condition check is tight enough to avoid false positives while covering the known crash signature. Tests cover the main paths (exact match, path variations, wrong type, wrong message, non-undici stacks, null inputs, missing stack).
<h3>Confidence Score: 4/5</h3>
- Safe to merge; the core fix is correct and narrowly scoped, with one non-critical architectural concern around idempotency of the handler installer.
- The bug being fixed is real and well-understood, the detection heuristic is appropriately conservative, and the entry-point changes are straightforward refactors. The main concern — lack of an idempotency guard on `installUncaughtExceptionHandler` — is a pre-existing pattern in the codebase (same issue exists on `installUnhandledRejectionHandler`) and does not cause a regression in the normal single-call path. The test teardown issue is minor and consistent with existing test files.
- src/infra/unhandled-rejections.ts (idempotency guard), src/infra/uncaught-exception.handler.test.ts (handler not removed in afterAll)
<sub>Last reviewed commit: af1b9f5</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#11101: fix: handle AbortError and WebSocket 1006 in unhandled rejection ha...
by Nipurn123 · 2026-02-07
79.9%
#10034: Don't crash gateway on transient unhandled fetch failures
by gigq · 2026-02-06
76.3%
#12656: fix: install unhandled rejection handler before async boot operations
by kiranirabatti · 2026-02-09
75.6%
#17758: Fix crash on transient Discord gateway zombie connection errors
by DoyoDia · 2026-02-16
73.8%
#22424: fix: prevent crash when onUpdate is truthy but not callable (fixes ...
by mcaxtr · 2026-02-21
72.8%
#12953: fix: defer gateway restart until all replies are sent
by zoskebutler · 2026-02-10
72.8%
#3921: fix: sanitize fetch headers to prevent ByteString crash on Unicode ...
by nexiouscaliver · 2026-01-29
72.7%
#12234: gateway: incident tracking, recover command, and ciao ERR_SERVER_CL...
by levineam · 2026-02-09
72.5%
#4653: fix(gateway): improve crash resilience for mDNS and network errors
by AyedAlmudarra · 2026-01-30
72.3%
#17879: fix: prevent Slack auth errors from crashing the entire gateway
by zuyan9 · 2026-02-16
72.2%