#11688: feat(telegram): add health check watchdog for long-polling
channel: telegram
stale
## Summary
- Adds a periodic health check watchdog to the Telegram long-polling monitor
- Pings `bot.api.getMe()` every 2 minutes with a 10-second timeout
- If the check fails or times out, assumes the TCP connection is silently dead and restarts polling with a fresh connection
- Resets backoff counter on health-check-triggered restarts since they are controlled, not crash-induced
## Motivation
After periods of inactivity, NAT gateways and firewalls may silently drop idle TCP connections. When this happens, the long-polling socket hangs indefinitely — Grammy receives no error and no timeout, so the bot appears alive but never receives updates. This is especially common on headless servers or machines behind carrier-grade NAT.
The watchdog detects this condition and recovers automatically.
## Test plan
- [ ] Verify normal polling still works without interference from the health check
- [ ] Simulate a stale connection (e.g. firewall rule drop) and confirm the watchdog detects it and restarts polling
- [ ] Confirm health check timer is properly cleaned up on shutdown
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR adds a watchdog around the Telegram long-polling runner that periodically pings `bot.api.getMe()` (2-minute interval, 10-second timeout). If the health check fails, it stops the current `@grammyjs/runner` instance and restarts polling, resetting the backoff counter for these controlled restarts.
The change is isolated to `src/telegram/monitor.ts` and integrates into the existing restart/backoff loop for polling mode, with cleanup in `finally` to clear the interval and remove the abort listener.
<h3>Confidence Score: 3/5</h3>
- Mostly safe to merge, but the watchdog implementation has real timer/concurrency issues to fix first.
- The overall approach (periodic getMe + restart runner) is sound and cleanup is attempted in a finally block, but the current implementation leaks per-check timeout timers and can run overlapping async interval callbacks on hung connections, leading to duplicated requests and stop/restart races.
- src/telegram/monitor.ts
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#6447: fix(telegram): auto-restart polling when grammY runner exits silently
by AugmentAdvertise · 2026-02-01
82.8%
#10850: fix(telegram): await runner.stop() to prevent polling race conditio...
by talhaorak · 2026-02-07
80.2%
#3186: fix(telegram): sanitize update offset + lock polling
by daxiong888 · 2026-01-28
79.9%
#7247: fix(telegram): abort stale getUpdates connections after long-poll t...
by JanderV · 2026-02-02
79.2%
#8166: fix(telegram): lifecycle fixes for duplicate messages and auto-reco...
by cheenu1092-oss · 2026-02-03
78.9%
#6463: fix(telegram): improve timeout handling and prevent channel exits
by ai-fanatic · 2026-02-01
77.8%
#23450: fix(telegram): add polling health check to detect and recover from ...
by Elarwei001 · 2026-02-22
75.8%
#5561: fix(telegram): auto-restart on timeout + lower API timeout to 60s
by jesseproudman · 2026-01-31
75.3%
#14741: feat: telegram resilience utilities
by kalachbeg · 2026-02-12
75.3%
#7141: fix(telegram): unify network error detection to prevent poll crashes
by hclsys · 2026-02-02
74.8%