#21163: Prevent Slack DNS errors from crashing the gateway
size: XS
Cluster:
Slack Gateway Error Handling
# Prevent Slack DNS errors from crashing the gateway
## Summary
This patch prevents Slack Socket Mode DNS lookup failures from being treated as fatal unhandled rejections. We now classify transient network codes found in error message text (for example `ENOTFOUND`) as non-fatal, matching existing code/cause-based transient handling.
## Problem
- Expected: transient Slack network/DNS failures should log and continue.
- Actual: some Slack web-api errors only expose network codes in `message` text, so they bypass transient detection and trigger fatal unhandled rejection exit.
- Impact: gateway process can terminate during temporary DNS outages.
## Reproduction
1. Run gateway with Slack Socket Mode enabled.
2. Trigger DNS failure path where Slack error is surfaced as message text like: `A request error occurred: getaddrinfo ENOTFOUND slack.com` (without a structured `code` on the top-level error).
3. Observe unhandled rejection classification.
- Expected result: rejection is treated as transient network error and process continues.
- Actual result: rejection can be treated as generic/fatal and exit the process.
## Issues Found
Severity: high
Confidence: high
Status: fixed
| ID | Severity | Confidence | Area | Summary | Evidence | Status |
| --- | --- | --- | --- | --- | --- | --- |
| PR-21163-BUG-01 | high | high | `src/infra/unhandled-rejections.ts` | Transient network detection misses errors that only carry `ENOTFOUND`/similar tokens in message text | Issue #21082 stack + previous classifier path only checked `code`/`cause`/`fetch failed` | fixed |
## Fix Approach
- Added message-based transient network detection using known transient code tokens derived from existing `TRANSIENT_NETWORK_CODES`.
- Kept existing code/cause/aggregate detection paths unchanged.
- Added tests for Slack-style message-only ENOTFOUND errors in both unit and fatal-classification suites.
## Testing
- `pnpm test -- src/infra/unhandled-rejections.test.ts src/infra/unhandled-rejections.fatal-detection.test.ts` (pass)
- `pnpm check` (pass)
- `pnpm build` (pass)
## Risk / Notes
- Low risk, narrowly scoped to error classification.
- Scope is conservative: only known transient network code tokens are matched in messages.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Adds message-based transient network error detection to prevent Slack DNS failures from crashing the gateway. The fix scans error message text for known transient network codes (like `ENOTFOUND`) when structured `code` properties are absent, matching the existing error classification approach.
<h3>Confidence Score: 5/5</h3>
- Safe to merge with no risk
- The change is narrowly scoped to error classification logic, uses conservative pattern matching against known transient network codes, has comprehensive test coverage for both the new message-based detection and integration with existing fatal-detection flow, and follows established patterns in the codebase
- No files require special attention
<sub>Last reviewed commit: b2e5857</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#23787: Handle transient Slack request errors without crashing the gateway
by graysurf · 2026-02-22
88.9%
#22896: Handle Slack SDK .original in transient network detection
by creditblake · 2026-02-21
81.7%
#17879: fix: prevent Slack auth errors from crashing the entire gateway
by zuyan9 · 2026-02-16
79.9%
#7563: fix: expand transient network error detection
by kaigritun · 2026-02-03
79.7%
#7558: fix: Handle Grammy/Telegram network errors to prevent gateway crashes
by kaigritun · 2026-02-03
78.5%
#17758: Fix crash on transient Discord gateway zombie connection errors
by DoyoDia · 2026-02-16
77.8%
#11101: fix: handle AbortError and WebSocket 1006 in unhandled rejection ha...
by Nipurn123 · 2026-02-07
77.2%
#12369: fix: register unhandled rejection handler for Slack monitor
by Yida-Dev · 2026-02-09
76.8%
#22096: fix(slack): traverse .original for Slack SDK errors; pass recipient...
by maiclaw · 2026-02-20
76.6%
#21967: Harden Slack allow-from resolution against undefined catch crash
by graysurf · 2026-02-20
76.4%