#17435: fix(debounce): retry flush with exponential backoff to prevent silent message loss
stale
size: S
trusted-contributor
Cluster:
Network Error Handling Improvements
## fix(debounce): retry inbound flush on lock contention
Fixes #17421
### Problem
When the inbound debounce timer fires and `onFlush` fails (e.g. due to session store lock contention with a concurrent cron job), the buffered messages are **permanently and silently lost**:
```
telegram debounce flush failed: Error: timeout acquiring session store lock
```
The user sees blue checkmarks on Telegram, but the agent never receives the message. Neither side knows anything was lost.
From the bug report: **5 messages dropped in 2 days** during normal operation.
### Root Cause
`flushBuffer()` in `inbound-debounce.ts` catches the error and calls `onError()`, but all channel handlers (`telegram`, `discord`, `signal`, `slack`, `imessage`) only log the error — the buffered items are discarded with no retry.
### Fix
Add retry with exponential backoff to `flushBuffer()`:
```typescript
// Retry loop (default: 3 attempts, 500ms base backoff)
for (let attempt = 0; attempt <= retryAttempts; attempt++) {
try {
await params.onFlush(buffer.items);
return; // success
} catch (err) {
lastErr = err;
if (attempt < retryAttempts) {
await delay(retryBaseMs * 2 ** attempt);
// 500ms → 1000ms → 2000ms
}
}
}
params.onError?.(lastErr, buffer.items); // only after all retries exhausted
```
New configurable options (backwards compatible):
- `retryAttempts`: max retry count (default: 3)
- `retryBaseMs`: base delay for exponential backoff (default: 500ms)
### Changed Files
| File | Change |
|------|--------|
| `src/auto-reply/inbound-debounce.ts` | Add retry loop to `flushBuffer()` |
| `src/auto-reply/inbound-debounce.retries-on-flush-failure.test.ts` | 3 new tests |
### Testing
- 3 new tests (retry success, retry exhaustion, no-retry path)
- All 659 auto-reply tests pass
- Lint + format clean
### Impact
Affects ALL channels: Telegram, Discord, Signal, Slack, iMessage. Any channel using inbound debounce benefits from this fix without code changes — the retry is in the shared debouncer.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Added retry with exponential backoff to `flushBuffer()` in `inbound-debounce.ts` to prevent silent message loss when `onFlush` fails due to session store lock contention. The fix applies globally to all messaging channels (Telegram, Discord, Signal, Slack, iMessage) through the shared debouncer without requiring channel-specific code changes.
- Introduced configurable `retryAttempts` (default: 3) and `retryBaseMs` (default: 500ms) parameters
- Implemented exponential backoff retry loop (500ms → 1000ms → 2000ms) before calling `onError`
- Extended retry logic to both debounced and immediate (non-debounced) message paths
- Added comprehensive test coverage for retry success, exhaustion, and happy path scenarios
- Unrelated changes: log file and `media.test.ts` stabilization fix
<h3>Confidence Score: 5/5</h3>
- Safe to merge with high confidence
- The implementation is clean, well-tested, and backwards compatible. The retry logic correctly prevents message loss without introducing breaking changes. All channel handlers automatically benefit from the fix without modifications.
- No files require special attention
<sub>Last reviewed commit: 756ba58</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#17243: fix(telegram): catch getFile network failures to prevent gateway cr...
by robbyczgw-cla · 2026-02-15
83.6%
#15985: fix(telegram): defer buffer deletion until processing succeeds
by coygeek · 2026-02-14
82.4%
#15467: feat(messages): add debounceMedia option for inbound debouncing
by tangcruz · 2026-02-13
80.2%
#8368: fix(telegram): preserve forwarded message metadata during debounce ...
by PatrickBauer · 2026-02-03
79.8%
#8166: fix(telegram): lifecycle fixes for duplicate messages and auto-reco...
by cheenu1092-oss · 2026-02-03
78.3%
#23238: fix(telegram): account named "default" silently breaks inbound polling
by anillBhoi · 2026-02-22
76.9%
#6463: fix(telegram): improve timeout handling and prevent channel exits
by ai-fanatic · 2026-02-01
76.9%
#10509: fix(telegram): bare abort words bypass debounce + clear buffered me...
by romancircus · 2026-02-06
76.4%
#11472: fix: retry media fetch on transient network errors
by openclaw-quenio · 2026-02-07
75.9%
#11653: fix(telegram): retry without message_thread_id on stale forum threa...
by liuxiaopai-ai · 2026-02-08
75.3%