← Back to PRs

#17435: fix(debounce): retry flush with exponential backoff to prevent silent message loss

by widingmarcus-cyber open 2026-02-15 19:07 View on GitHub →
stale size: S trusted-contributor
## fix(debounce): retry inbound flush on lock contention Fixes #17421 ### Problem When the inbound debounce timer fires and `onFlush` fails (e.g. due to session store lock contention with a concurrent cron job), the buffered messages are **permanently and silently lost**: ``` telegram debounce flush failed: Error: timeout acquiring session store lock ``` The user sees blue checkmarks on Telegram, but the agent never receives the message. Neither side knows anything was lost. From the bug report: **5 messages dropped in 2 days** during normal operation. ### Root Cause `flushBuffer()` in `inbound-debounce.ts` catches the error and calls `onError()`, but all channel handlers (`telegram`, `discord`, `signal`, `slack`, `imessage`) only log the error — the buffered items are discarded with no retry. ### Fix Add retry with exponential backoff to `flushBuffer()`: ```typescript // Retry loop (default: 3 attempts, 500ms base backoff) for (let attempt = 0; attempt <= retryAttempts; attempt++) { try { await params.onFlush(buffer.items); return; // success } catch (err) { lastErr = err; if (attempt < retryAttempts) { await delay(retryBaseMs * 2 ** attempt); // 500ms → 1000ms → 2000ms } } } params.onError?.(lastErr, buffer.items); // only after all retries exhausted ``` New configurable options (backwards compatible): - `retryAttempts`: max retry count (default: 3) - `retryBaseMs`: base delay for exponential backoff (default: 500ms) ### Changed Files | File | Change | |------|--------| | `src/auto-reply/inbound-debounce.ts` | Add retry loop to `flushBuffer()` | | `src/auto-reply/inbound-debounce.retries-on-flush-failure.test.ts` | 3 new tests | ### Testing - 3 new tests (retry success, retry exhaustion, no-retry path) - All 659 auto-reply tests pass - Lint + format clean ### Impact Affects ALL channels: Telegram, Discord, Signal, Slack, iMessage. Any channel using inbound debounce benefits from this fix without code changes — the retry is in the shared debouncer. <!-- greptile_comment --> <h3>Greptile Summary</h3> Added retry with exponential backoff to `flushBuffer()` in `inbound-debounce.ts` to prevent silent message loss when `onFlush` fails due to session store lock contention. The fix applies globally to all messaging channels (Telegram, Discord, Signal, Slack, iMessage) through the shared debouncer without requiring channel-specific code changes. - Introduced configurable `retryAttempts` (default: 3) and `retryBaseMs` (default: 500ms) parameters - Implemented exponential backoff retry loop (500ms → 1000ms → 2000ms) before calling `onError` - Extended retry logic to both debounced and immediate (non-debounced) message paths - Added comprehensive test coverage for retry success, exhaustion, and happy path scenarios - Unrelated changes: log file and `media.test.ts` stabilization fix <h3>Confidence Score: 5/5</h3> - Safe to merge with high confidence - The implementation is clean, well-tested, and backwards compatible. The retry logic correctly prevents message loss without introducing breaking changes. All channel handlers automatically benefit from the fix without modifications. - No files require special attention <sub>Last reviewed commit: 756ba58</sub> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs