#18115: fix: prevent voice message loss during concurrent update processing
channel: telegram
stale
size: S
Cluster:
Telegram Command Fixes
## Bug Description
When the Telegram polling loop (grammY runner) hits a transient network error, it restarts with backoff. There is a potential race window where update offsets are persisted before voice message processing completes, so if the runner crashes mid-processing, those updates are acknowledged to Telegram but never fully handled. This is a narrow race (~5% occurrence) and users can retry manually, making it low severity. The root cause likely involves the update offset persistence in the runner's fetch cycle vs. the sink's processing pipeline.
**Severity:** low
## Root Cause
The race condition is in bot.ts where recordUpdateId (line ~227) persists the highest update_id after each update's middleware completes, but updates from different chats run concurrently (sequentialize only serializes per-chat). When a batch contains [update 100 (voice, chat A), update 101 (text, chat B)], update 101 can finish first and persist lastUpdateId=101 to disk. If the runner then crashes (network error) before update 100 finishes processing, on restart shouldSkipUpdate sees lastUpdateId=101 and skips update 100 — the voice message is permanently lost. The core issue is that the offset persistence tracks the MAX update_id completed rather than ensuring ALL update_ids up to that point have been processed. Combined with the grammY runner's concurrent sink processing, any out-of-order completion creates a window where lower-numbered updates can be silently dropped.
## Fix
Created src/telegram/update-offset-watermark.ts with contiguous watermark tracker (TCP ACK window approach). Modified src/telegram/bot.ts to use watermark - tracks in-flight update IDs and only persists offset as highest contiguous completed update_id, preventing out-of-order completion from skipping lower-numbered updates.
## Regression Test
Added src/telegram/update-offset-watermark.test.ts with 4 tests: verifies offset doesn't advance past in-flight updates, handles in-order completion, respects initial offset, and handles gaps in update IDs.
## Verification
1) Diff matches claimed changes — 3 files modified (bot.ts, new watermark module + test). 2) Security: .gitignore exists, no sensitive files in diff. 3) All tests pass (1 pre-existing failure on main unrelated to this fix). 4) Regression test covers exact race scenario: out-of-order completion doesn't advance offset past in-flight updates. 5) Fix is minimal and targeted — watermark tracker replaces direct MAX update_id persistence with contiguous-completion tracking (TCP ACK window approach). 6) Fix addresses root cause: offset only advances when all lower update IDs are complete, preventing skipped updates on crash. 7) Test would fail without fix — without watermark, completing update 101 before 100 would immediately persist offset=101.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR introduces a contiguous watermark tracker (`update-offset-watermark.ts`) that replaces the previous "max completed update_id" approach with a TCP ACK window pattern, ensuring the persisted offset only advances when all lower-numbered updates have completed. This correctly addresses the race condition where out-of-order completion could cause lower-numbered in-flight updates to be permanently skipped on crash/restart.
- **New module** `update-offset-watermark.ts` cleanly implements the watermark tracker with `markStarted`/`markCompleted`/`getCurrentOffset` API and proper contiguous-completion logic.
- **Integration in `bot.ts`** wires the watermark into the existing middleware pipeline, calling `markStarted` before handler dispatch and `markCompleted` after.
- **Bug found:** `markStarted` is called unconditionally for all updates entering the middleware, but `recordUpdateId` short-circuits (skips `markCompleted`) for re-delivered updates with ID ≤ `lastUpdateId`. This leaves phantom entries in the `inFlight` set that permanently block watermark advancement — effectively the same class of bug this PR aims to fix, but triggered by update re-delivery instead of out-of-order completion.
- **Test coverage** is solid for the watermark module in isolation (4 tests covering the core scenarios) but does not cover the integration-level bug with skipped updates.
<h3>Confidence Score: 2/5</h3>
- The watermark logic is sound in isolation but the bot.ts integration has a bug that can permanently block offset advancement after update re-delivery.
- Score of 2 reflects that while the core watermark module is well-designed and tested, the integration in bot.ts has a logical bug where markStarted is called for re-delivered updates that will never receive a corresponding markCompleted call, causing phantom in-flight entries that block all future watermark advancement. This is the same class of update-loss bug the PR aims to fix.
- src/telegram/bot.ts — the markStarted/markCompleted pairing in the middleware needs to be guarded against re-delivered (already-processed) update IDs.
<sub>Last reviewed commit: e120cd2</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#11347: fix: scope Telegram update offset to bot token
by anooprdawar · 2026-02-07
81.6%
#16105: fix: handle message_reaction updates in group polling mode
by claw-sylphx · 2026-02-14
80.9%
#16995: fix(telegram): record update ID before processing to prevent crash ...
by AI-Reviewer-QS · 2026-02-15
79.7%
#3186: fix(telegram): sanitize update offset + lock polling
by daxiong888 · 2026-01-28
79.5%
#22363: fix(telegram): isolate update offset state by bot token
by AIflow-Labs · 2026-02-21
79.4%
#10850: fix(telegram): await runner.stop() to prevent polling race conditio...
by talhaorak · 2026-02-07
78.0%
#17953: fix(telegram): prevent silent message loss and duplicate messages i...
by zuyan9 · 2026-02-16
78.0%
#19673: fix(telegram): avoid starting streaming replies with only 1-2 words
by emanuelst · 2026-02-18
78.0%
#17316: fix: ack reaction not removed when block streaming is enabled (Tele...
by czmathew · 2026-02-15
77.4%
#8166: fix(telegram): lifecycle fixes for duplicate messages and auto-reco...
by cheenu1092-oss · 2026-02-03
77.1%