#22367: fix(whatsapp): prevent permanent listener loss after abort during reconnect backoff
channel: whatsapp-web
size: XS
experienced-contributor
Cluster:
GlobalThis Integration Fixes
## Summary
- **Problem:** When `abortSignal` fires during the reconnect backoff `sleep()` in `monitorWebChannel`, the catch block unconditionally `break`s out of the reconnect loop. This leaves `activeWebListener` as `null` permanently — even though the WhatsApp connection itself recovers.
- **Why it matters:** Any action that calls `requireActiveWebListener()` (e.g. `react`, `send`) will throw when the listener is null. The gateway *appears* healthy (channels show as started) while web actions silently fail until a full restart.
- **What changed:** The `catch` block after `sleep(delay, abortSignal)` now checks `stopRequested()` before breaking. If the abort was not a deliberate stop, `continue` keeps the reconnect loop alive.
- **What did NOT change:** Deliberate shutdown (abort + `stopRequested() === true`) still cleanly exits. No changes to the outer channel manager restart logic in `server-channels.ts`.
> **Note to maintainers:** This is a minimal targeted fix (Option C). A more robust long-term approach (Option B) would be to decouple the inner reconnect loop's abort signal from the outer one — e.g., derive a child AbortController with independent lifecycle so the channel manager can abort a current connection attempt without permanently killing the monitor's reconnect capability. Worth considering in a future refactor.
## Change Type (select all)
- [x] Bug fix
## Scope (select all touched areas)
- [x] Integrations
## Linked Issue/PR
- Related #99
## User-visible / Behavior Changes
WhatsApp web actions (send, react, poll, etc.) no longer permanently fail after a watchdog-triggered reconnect. Previously required a full gateway restart to recover.
## Security Impact (required)
- New permissions/capabilities? `No`
- Secrets/tokens handling changed? `No`
- New/changed network calls? `No`
- Command/tool execution surface changed? `No`
- Data access scope changed? `No`
## Repro + Verification
### Environment
- OS: Ubuntu 24.04 (ARM64, OCI)
- Runtime/container: Node.js, OpenClaw gateway
- Integration/channel: WhatsApp Web
### Steps
1. Start gateway with WhatsApp channel
2. Wait for watchdog timeout (30m no messages) or trigger a reconnect
3. If abort signal fires during the backoff sleep, the listener is permanently lost
4. Attempt `message react` → fails with "No active WhatsApp Web listener"
### Expected
- Reconnect loop continues, new listener is established, reactions work
### Actual (before fix)
- Reconnect loop exits, `activeWebListener` stays null, reactions fail permanently
## Evidence
- [x] Trace/log snippets — reproduced via production gateway logs showing listener null after watchdog reconnect
## Human Verification (required)
- Verified scenarios: Code path analysis confirmed the unconditional `break` exits the loop; with fix, `continue` preserves the loop when `stopRequested()` is false
- Edge cases checked: Deliberate shutdown still breaks cleanly; `sigintStop` path unaffected; max reconnect attempts logic unaffected
- What you did **not** verify: Full e2e test with real WhatsApp connection (will verify on local deployment)
## Compatibility / Migration
- Backward compatible? `Yes`
- Config/env changes? `No`
- Migration needed? `No`
## Failure Recovery (if this breaks)
- How to disable/revert: Revert single commit, restart gateway
- Known bad symptoms: If somehow `continue` causes an infinite loop (unlikely — `stopRequested()` and `maxAttempts` are both checked at loop top), the monitor would spin on reconnect attempts
## Risks and Mitigations
- Risk: The `continue` skips the rest of the loop body after the sleep, jumping back to the `stopRequested()` check at the top. If the abort signal is permanently set, `stopRequested()` returns true and breaks cleanly.
- Mitigation: The existing `stopRequested()` check at loop top and `maxAttempts` guard both prevent infinite loops.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Fixed permanent listener loss when `abortSignal` fires during reconnect backoff sleep in `monitorWebChannel`. Previously, any abort during the backoff would unconditionally break the reconnect loop, leaving `activeWebListener` as `null` even when the WhatsApp connection recovered. The fix adds a `stopRequested()` check in the catch block - only breaking on deliberate shutdown, otherwise continuing the loop to re-establish the listener.
- Prevents permanent reaction failures after watchdog-triggered reconnects
- Preserves clean shutdown behavior when abort is deliberate
- Existing guards (`stopRequested()` at loop top, `maxAttempts` limit) prevent infinite loops
<h3>Confidence Score: 5/5</h3>
- Safe to merge - minimal targeted fix with proper safeguards
- The change is a 7-line surgical fix to a specific edge case with clear logic. The existing loop guards (`stopRequested()` check at loop top and `maxAttempts` limit) prevent potential infinite loops. The fix preserves the original shutdown behavior while solving the permanent listener loss issue. No changes to external interfaces or side effects.
- No files require special attention
<sub>Last reviewed commit: e851ca8</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#22399: fix(web): use globalThis singleton for active-listener state
by mcinteerj · 2026-02-21
81.8%
#9727: fix(whatsapp): retry reconnect loop on initial connection failure
by luizlf · 2026-02-05
81.2%
#17487: fix: WhatsApp connection stability - continue reconnection after ma...
by MisterGuy420 · 2026-02-15
80.9%
#16923: fix(web): resolve stale socket race condition in WhatsApp auto-reply
by dorukardahan · 2026-02-15
76.4%
#22143: Fix memory leak in WhatsApp channel reconnection loop
by lancejames221b · 2026-02-20
76.0%
#20554: fix(googlechat): prevent infinite restart loop in startAccount
by Gitjay11 · 2026-02-19
75.6%
#22322: fix(googlechat): keep webhook monitor alive until abort
by AIflow-Labs · 2026-02-21
75.1%
#20309: [BUG]: fix telegram webhook should wait for abort signal instead of...
by kesor · 2026-02-18
74.8%
#23134: fix(gateway): skip auto-restart for webhook channels that resolve i...
by puneet1409 · 2026-02-22
74.4%
#23621: fix(LINE): keep startAccount promise alive to prevent auto-restart ...
by ttakanawa · 2026-02-22
74.3%