← Back to PRs

#21163: Prevent Slack DNS errors from crashing the gateway

by graysurf open 2026-02-19 18:44 View on GitHub →
size: XS
# Prevent Slack DNS errors from crashing the gateway ## Summary This patch prevents Slack Socket Mode DNS lookup failures from being treated as fatal unhandled rejections. We now classify transient network codes found in error message text (for example `ENOTFOUND`) as non-fatal, matching existing code/cause-based transient handling. ## Problem - Expected: transient Slack network/DNS failures should log and continue. - Actual: some Slack web-api errors only expose network codes in `message` text, so they bypass transient detection and trigger fatal unhandled rejection exit. - Impact: gateway process can terminate during temporary DNS outages. ## Reproduction 1. Run gateway with Slack Socket Mode enabled. 2. Trigger DNS failure path where Slack error is surfaced as message text like: `A request error occurred: getaddrinfo ENOTFOUND slack.com` (without a structured `code` on the top-level error). 3. Observe unhandled rejection classification. - Expected result: rejection is treated as transient network error and process continues. - Actual result: rejection can be treated as generic/fatal and exit the process. ## Issues Found Severity: high Confidence: high Status: fixed | ID | Severity | Confidence | Area | Summary | Evidence | Status | | --- | --- | --- | --- | --- | --- | --- | | PR-21163-BUG-01 | high | high | `src/infra/unhandled-rejections.ts` | Transient network detection misses errors that only carry `ENOTFOUND`/similar tokens in message text | Issue #21082 stack + previous classifier path only checked `code`/`cause`/`fetch failed` | fixed | ## Fix Approach - Added message-based transient network detection using known transient code tokens derived from existing `TRANSIENT_NETWORK_CODES`. - Kept existing code/cause/aggregate detection paths unchanged. - Added tests for Slack-style message-only ENOTFOUND errors in both unit and fatal-classification suites. ## Testing - `pnpm test -- src/infra/unhandled-rejections.test.ts src/infra/unhandled-rejections.fatal-detection.test.ts` (pass) - `pnpm check` (pass) - `pnpm build` (pass) ## Risk / Notes - Low risk, narrowly scoped to error classification. - Scope is conservative: only known transient network code tokens are matched in messages. <!-- greptile_comment --> <h3>Greptile Summary</h3> Adds message-based transient network error detection to prevent Slack DNS failures from crashing the gateway. The fix scans error message text for known transient network codes (like `ENOTFOUND`) when structured `code` properties are absent, matching the existing error classification approach. <h3>Confidence Score: 5/5</h3> - Safe to merge with no risk - The change is narrowly scoped to error classification logic, uses conservative pattern matching against known transient network codes, has comprehensive test coverage for both the new message-based detection and integration with existing fatal-detection flow, and follows established patterns in the codebase - No files require special attention <sub>Last reviewed commit: b2e5857</sub> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs