#5561: fix(telegram): auto-restart on timeout + lower API timeout to 60s
channel: telegram
gateway
## Problem
The Telegram channel handler was getting wedged and not recovering when:
1. grammY's long-polling timed out (500s default)
2. Inference API was slow/unresponsive
3. Agent runs blocked the session lane for 10+ minutes
Once wedged, the channel stayed down until manual restart.
## Solution
### Auto-restart with exponential backoff
- Channels now automatically restart when they exit unexpectedly
- Backoff: 2s → 4s → 8s → 16s → 32s → 60s (capped), with 20% jitter
- Attempt counter resets after 5 minutes of successful operation
- Gives up after 10 consecutive failures to prevent infinite loops
### Lower API timeout
- Reduced grammY API timeout from 500s to 60s
- Allows faster detection and recovery from stuck requests
### Clean shutdown handling
- Deliberate `stopChannel` calls cancel pending restarts
- Reset attempt counters on deliberate stop
## Testing
Tested overnight with a bot that was previously freezing every 1-2 hours. The auto-restart kicked in successfully on timeouts and recovered within seconds.
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR improves Telegram channel resilience by (1) adding auto-restart behavior to the gateway channel manager when a channel task exits unexpectedly (with exponential backoff, jitter, and a max-attempt cap), and (2) reducing grammY API call timeouts to 60s by default to detect stuck requests sooner.
Key integration points:
- `src/gateway/server-channels.ts` now tracks per-account restart attempts and pending timers and schedules restarts from the channel task’s `.finally()` when the abort signal was not triggered.
- `src/telegram/bot.ts` now always passes `client.timeoutSeconds` to grammY’s `Bot` so API calls don’t hang for ~500s by default.
Notable issues:
- The “reset attempts after 5 minutes of successful running” currently uses a timestamp set at start initiation rather than a confirmed healthy-running signal, and it isn’t cleared on deliberate stop; both can reset attempts in cases that don’t represent successful recovery.
- Restart timers aren’t cancelled when scheduling new restarts or when manual starts happen, which can create overlapping restart behavior.
- Telegram bot has two adjacent `bot.catch` handlers, which will double-log errors.
<h3>Confidence Score: 3/5</h3>
- Reasonably safe to merge, but restart bookkeeping has edge cases that can cause unexpected restart behavior and confusing logs.
- Core changes are localized and conceptually straightforward (restart on unexpected exit + lower API timeout). However, the restart-attempt reset uses a start timestamp rather than a proven healthy-running window and isn’t cleared on deliberate stop, and restart timers aren’t de-duplicated/cancelled outside of stopChannel. These can weaken the intended max-attempt protection and create overlapping restarts in certain manual-start/rapid-exit scenarios.
- src/gateway/server-channels.ts
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
**Context used:**
- Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))
- Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13))
<!-- /greptile_comment -->
Most Similar PRs
#6463: fix(telegram): improve timeout handling and prevent channel exits
by ai-fanatic · 2026-02-01
82.9%
#6447: fix(telegram): auto-restart polling when grammY runner exits silently
by AugmentAdvertise · 2026-02-01
82.5%
#8166: fix(telegram): lifecycle fixes for duplicate messages and auto-reco...
by cheenu1092-oss · 2026-02-03
80.1%
#18254: add /update chat command for Telegram git updates
by dangmstaredu · 2026-02-16
78.2%
#10865: telegram: fast-ACK webhook and retry bind on EADDRINUSE
by u9733037 · 2026-02-07
78.2%
#7247: fix(telegram): abort stale getUpdates connections after long-poll t...
by JanderV · 2026-02-02
77.9%
#14741: feat: telegram resilience utilities
by kalachbeg · 2026-02-12
77.3%
#9085: fix: improve stability for terminated responses and telegram retries
by vladdick88 · 2026-02-04
77.1%
#10850: fix(telegram): await runner.stop() to prevent polling race conditio...
by talhaorak · 2026-02-07
76.6%
#3186: fix(telegram): sanitize update offset + lock polling
by daxiong888 · 2026-01-28
75.7%