← Back to PRs

#23787: Handle transient Slack request errors without crashing the gateway

by graysurf open 2026-02-22 17:59 View on GitHub →
size: S
## Problem summary - Expected: temporary Slack network disconnects are treated as transient and do not terminate the gateway. - Actual: unhandled `@slack/web-api` request errors (for example `slack_webapi_request_error`) fall through transient detection and can terminate the process. - Impact: gateways can crash/restart-loop during brief connectivity drops (sleep/resume, Wi-Fi handoff, unstable networks). ## Reproduction 1. Run OpenClaw with Slack enabled. 2. Trigger a transient network disconnect while Slack API calls are in flight. 3. Observe an unhandled rejection like `A request error occurred: Client network socket disconnected before secure TLS connection was established`. 4. Gateway exits instead of continuing. ## Issues found | ID | Issue | Severity | Status | Notes | | --- | --- | --- | --- | --- | | PR-23787-BUG-01 | #23169 Slack transient network request errors can crash gateway via unhandled rejection handling. | high | fixed | Added Slack-specific transient classification plus regression tests. | ## Fix approach - Extended `isTransientNetworkError` to treat `slack_webapi_request_error` as transient when: - nested `original` error is transient, or - the error message contains transient network codes/signatures. - Preserved existing fatal behavior for non-transient Slack request errors. - Added regression coverage for: - transient Slack request errors (non-fatal), - non-transient Slack request errors (still fatal). ## Tests run - `pnpm test -- src/infra/unhandled-rejections.fatal-detection.test.ts` ✅ - `pnpm check` ❌ (fails on existing unrelated `pnpm tsgo` baseline errors on current `main`, including `src/agents/pi-embedded-runner/extra-params.openrouter-cache-control.test.ts` and `src/discord/voice/manager.ts`). Closes #23169 <!-- greptile_comment --> <h3>Greptile Summary</h3> Extended transient network error detection to handle `slack_webapi_request_error` cases that were previously crashing the gateway during temporary network disconnects. The fix checks both nested `original` errors and message content for transient network signatures, treating genuine transient errors as non-fatal while preserving fatal behavior for actual Slack API errors. - Added Slack-specific transient classification in `isTransientNetworkError` (src/infra/unhandled-rejections.ts:113-131) - New helper `getErrorMessage` extracts error messages safely - Regression tests cover both transient (non-fatal) and non-transient (fatal) Slack request errors - Fixes issue #23169 where brief connectivity drops could restart-loop the gateway <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk - The implementation correctly extends existing transient error detection with a well-scoped fix. The logic checks the Slack error code first, then recursively validates nested `original` errors, and finally pattern-matches message content against known transient signatures. Test coverage includes both transient and non-transient cases, verifying that fatal errors still exit while transient ones continue. The fix follows existing code patterns and doesn't introduce breaking changes. - No files require special attention <sub>Last reviewed commit: 4bece53</sub> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs