← Back to PRs

#9727: fix(whatsapp): retry reconnect loop on initial connection failure

by luizlf open 2026-02-05 16:15 View on GitHub →
channel: whatsapp-web
## Summary - Retry initial WhatsApp Web listener startup failures in `monitorWebChannel` using the existing reconnect backoff instead of exiting. - Update reconnect status/logging for startup failures and respect `maxAttempts`. - Add a regression test that simulates an initial `ENOTFOUND` and verifies the reconnect loop retries. ## Why - DNS/network errors during the very first WhatsApp connection (for example `ENOTFOUND web.whatsapp.com`) previously escaped the reconnect loop, causing the gateway to stop. This change makes initial connection failures behave like later reconnects and fixes #13506. ## Log Evidence - Original bug (2026-02-05 07:16:14 UTC, production OpenClaw 2026.2.3): reconnect loop did not engage; channel remained dead until manual restart at 13:06. ```text {"error":"Error: getaddrinfo ENOTFOUND web.whatsapp.com"},"WebSocket error" path: "opt/homebrew/lib/node_modules/openclaw/dist/web/session.js:117" time: "2026-02-05T07:16:14.679Z" ``` - Fix working (2026-02-05 15:01:56 UTC, dev build with fix): new "will retry" log indicates the initial failure is captured and the reconnect loop continues. ```text {"error":"ENOTFOUND web.whatsapp.com","reconnectAttempts":0},"web reconnect: failed to establish initial connection; will retry" path: "/Users/lsantos/Projects/openclaw/src/web/auto-reply/monitor.ts:214" time: "2026-02-05T15:01:56.442Z" ``` ## Testing - `pnpm vitest run --config vitest.unit.config.ts "src/web/auto-reply.reconnects"` (1 test passed in 17ms) - New test: `src/web/auto-reply.reconnects-after-initial-connection-failure.test.ts` uses a mocked listenerFactory that throws `ENOTFOUND` on the first attempt, asserts a second attempt happens without propagating the error, then aborts and closes cleanly. - `pnpm build && pnpm check && pnpm test` ## AI Assistance - AI-assisted: yes (Codex (gpt-5.2-codex xhigh) full-auto). - Collaboration notes: - Claude (Opus 4.5) analyzed logs and identified the root cause in `monitorWebChannel` (the initial `await listenerFactory()` call lacked a try/catch). - Codex CLI reviewed the root cause, implemented the fix and wrote the test. - Claude reviewed the fix and confirmed it matched the root-cause analysis. - Original prompt to Codex: "Fix the WhatsApp DNS reconnect bug. The issue is in src/web/auto-reply/monitor.ts around line 192 - the await listenerFactory() call needs try/catch to handle initial connection failures and continue the retry loop with backoff." - Understanding confirmation: I understand this change catches listener startup errors, records the failure, increments reconnect attempts, waits with backoff, and retries until the max attempts is reached; the new test asserts a retry happens after an initial `ENOTFOUND`. <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR updates the WhatsApp Web reconnect logic so that failures during the *initial* listener startup are handled by the same reconnect/backoff loop as later disconnects, rather than escaping and stopping the gateway. Concretely, `monitorWebChannel` now wraps the initial `listenerFactory`/`monitorWebInbox` startup in a `try/catch`, records the error in channel status, increments `reconnectAttempts`, applies `maxAttempts`, waits using the configured backoff, and retries. It also adds a regression test that simulates a first-attempt DNS failure (`ENOTFOUND`) from the listener factory and asserts that the reconnect loop performs a second startup attempt without propagating the initial error, then aborts cleanly. <h3>Confidence Score: 4/5</h3> - This PR is close to merge-ready; the runtime fix looks correct, but the new regression test is likely to be flaky in CI as written. - The reconnect-loop change is localized and follows the existing backoff/maxAttempts flow. The main concern is the test’s dependence on a hard 200ms wall-clock polling loop with real timers, which can intermittently fail under CI load despite correct behavior. - src/web/auto-reply.reconnects-after-initial-connection-failure.test.ts <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs