#16628: feat(web): implement three-tier graduated retry strategy
channel: whatsapp-web
stale
size: M
Cluster:
WhatsApp Connection Stability Fixes
## Summary
Implements a three-tier graduated retry strategy for WhatsApp Web connections that prevents the channel from permanently exiting on transient network failures.
**Problem:** When network connectivity is lost (e.g., router reboot, ISP maintenance), the current 12-attempt retry limit is exhausted and the channel exits permanently, requiring manual gateway restart.
**Solution:** Replace single-policy retry with tiered escalation:
| Tier | Attempts | Backoff | Use Case |
|------|----------|---------|----------|
| 1 (Fast) | 12 | 2s → 30s | Brief network blips |
| 2 (Medium) | 10 | 30s → 5min | Extended outages |
| 3 (Slow) | ∞ | 5min → 15min | Prolonged issues |
**Behavior:**
- Escalates to next tier when current tier exhausted
- Resets to Tier 1 after healthy connection (uptime > heartbeat period)
- Channel never permanently exits due to network issues
- Fully backward compatible (legacy config still works)
## Test Plan
- [ ] Unit tests for tiered policy resolution
- [ ] Manual test: disconnect network, verify tier escalation logs
- [ ] Manual test: reconnect after extended outage, verify tier reset
- [ ] Verify backward compatibility with existing `web.reconnect` config
## Related
Also includes a drive-by fix for `qmd-scope.ts` (undefined key handling) that was blocking the build.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Implements a three-tier graduated retry strategy for WhatsApp Web connections, preventing permanent channel exit on transient network failures. The implementation correctly escalates from fast retries (2-30s, 12 attempts) through medium backoff (30s-5min, 10 attempts) to unlimited patient retries (5-15min). Backward compatibility is properly maintained by mapping legacy `reconnect` config to tier1, and the tier reset after healthy connections ensures the system recovers to fast retries once stable.
Key changes:
- Added `TieredReconnectPolicy` type with three-tier structure in `reconnect.ts`
- Updated `monitor.ts` to track current tier and tier-specific attempt counts
- Implemented tier escalation logic that triggers when tier max attempts are reached
- Added comprehensive unit tests for policy resolution and tier selection
- Drive-by fix: added undefined key guard in `qmd-scope.ts` to prevent build issues
The escalation logic correctly increments counters before checking limits, resets tier attempts to 1 on escalation, and never breaks on tier 3 (unlimited). Previous review concerns about backward compatibility and missing tests have been addressed.
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge with minor risk - the logic is sound and well-tested
- Score reflects solid implementation of the tiered retry strategy with proper backward compatibility, comprehensive unit tests, and correct handling of edge cases (unlimited tier 3, tier reset after healthy connection). The logic has been carefully validated through code analysis. Deducted one point because manual testing of tier escalation during actual network failures hasn't been completed yet per the test plan, though the logic itself is correct.
- No files require special attention - all changes are well-structured and follow the existing patterns
<sub>Last reviewed commit: 9346332</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#9727: fix(whatsapp): retry reconnect loop on initial connection failure
by luizlf · 2026-02-05
76.7%
#17487: fix: WhatsApp connection stability - continue reconnection after ma...
by MisterGuy420 · 2026-02-15
76.5%
#9515: fix(web): retry WhatsApp 515 restart up to 3 times with delay
by Sebachowa · 2026-02-05
74.3%
#21893: fix(web): enforce sendPolicy on WhatsApp auto-reply delivery path
by hydro13 · 2026-02-20
72.3%
#16923: fix(web): resolve stale socket race condition in WhatsApp auto-reply
by dorukardahan · 2026-02-15
72.2%
#22143: Fix memory leak in WhatsApp channel reconnection loop
by lancejames221b · 2026-02-20
71.3%
#9232: Fix: Add automatic retry for network errors in message runs
by vishaltandale00 · 2026-02-05
69.7%
#14789: fix: per-account dmPolicy ignored in checkInboundAccessControl
by croll83 · 2026-02-12
69.6%
#7141: fix(telegram): unify network error detection to prevent poll crashes
by hclsys · 2026-02-02
69.3%
#22367: fix(whatsapp): prevent permanent listener loss after abort during r...
by mcinteerj · 2026-02-21
69.2%