#7558: fix: Handle Grammy/Telegram network errors to prevent gateway crashes
Cluster:
Network Error Handling Improvements
Fixes #7553
## Problem
Gateway crashes 3-6 times per day from unhandled Grammy (Telegram) network errors:
- `GrammyError: Call to 'sendMessage' failed! (502: Bad Gateway)`
- Connection timeouts and other transient failures
These errors weren't being caught by the existing transient network error handler.
## Solution
Extended `isTransientNetworkError()` to detect Grammy-specific errors:
- GrammyError and HttpError names
- HTTP 502, 503, 504 status codes
- Connection and timeout messages
These are now logged as warnings instead of crashing the gateway.
## Changes
- Added Grammy error detection in `src/infra/unhandled-rejections.ts`
- Added comprehensive tests for all Grammy error cases
- Follows existing pattern for network error handling
## Testing
Added tests for:
- GrammyError with 502, 503, 504
- Connection errors
- Timeout errors
- Non-network GrammyErrors (still crash as expected)
## Impact
Prevents 3-6 crashes per day for users running Telegram bot integrations.
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR extends the unhandled rejection transient network classifier to treat certain Telegram/Grammy failures (e.g., 502/503/504 responses and connection/timeout-like messages for `GrammyError`/`HttpError`) as non-fatal, so the gateway logs a warning instead of exiting. It adds unit tests covering those new cases alongside existing transient-network behaviors, fitting into the existing `installUnhandledRejectionHandler()` flow that suppresses/terminates based on error classification.
<h3>Confidence Score: 4/5</h3>
- This PR looks safe to merge and should reduce crashes, with only minor risk of misclassification due to message heuristics.
- Changes are localized to `isTransientNetworkError()` and backed by targeted unit tests. The main remaining risk is reliance on case-sensitive / loose substring and regex matching against error messages, which could miss some transient errors or (less likely) classify unrelated errors as transient.
- src/infra/unhandled-rejections.ts (message matching heuristics)
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#12870: fix: recover from telegram fetch errors (issue #12835)
by ambicuity · 2026-02-09
86.5%
#11101: fix: handle AbortError and WebSocket 1006 in unhandled rejection ha...
by Nipurn123 · 2026-02-07
81.8%
#7563: fix: expand transient network error detection
by kaigritun · 2026-02-03
81.6%
#10034: Don't crash gateway on transient unhandled fetch failures
by gigq · 2026-02-06
80.0%
#7141: fix(telegram): unify network error detection to prevent poll crashes
by hclsys · 2026-02-02
79.7%
#21163: Prevent Slack DNS errors from crashing the gateway
by graysurf · 2026-02-19
78.5%
#17758: Fix crash on transient Discord gateway zombie connection errors
by DoyoDia · 2026-02-16
78.2%
#4653: fix(gateway): improve crash resilience for mDNS and network errors
by AyedAlmudarra · 2026-01-30
77.9%
#23787: Handle transient Slack request errors without crashing the gateway
by graysurf · 2026-02-22
77.3%
#17243: fix(telegram): catch getFile network failures to prevent gateway cr...
by robbyczgw-cla · 2026-02-15
76.3%