← Back to PRs

#14993: fix(webchat): add heartbeat detection to prevent zombie WebSocket connections

by BenediktSchackenberg open 2026-02-12 21:05 View on GitHub →
app: web-ui size: S
## Summary Fixes #14938 The browser webchat client could enter a 'zombie' state where the WebSocket connection silently failed but the client never detected it. Messages would queue on the server but never reach the browser until manual reconnect. ## Root Cause The Node.js gateway client has tick monitoring: ```javascript startTickWatch() { const interval = Math.max(this.tickIntervalMs, 1000); this.tickTimer = setInterval(() => { if (Date.now() - this.lastTick > this.tickIntervalMs * 2) this.ws?.close(4000, 'tick timeout'); }, interval); } ``` The browser client was missing this, so it couldn't detect: - Browser tab sleeping (background throttling) - Network changes (WiFi reconnect, VPN toggle) - NAT/firewall timeouts - OS power management ## Solution Added tick monitoring to `GatewayBrowserClient` (mirrors Node.js implementation): 1. **Extract tick interval** from server hello response (`policy.tickIntervalMs`) 2. **Track tick events** - update `lastTick` timestamp on each `tick` event 3. **Run interval check** - if no tick received for 2x `tickIntervalMs`, close WebSocket with code 4000 to trigger auto-reconnect ## Changes `ui/src/ui/gateway.ts`: - Added `lastTick`, `tickIntervalMs`, `tickTimer` private fields - Added `startTickWatch()` and `stopTickWatch()` methods - Track tick events in `handleMessage()` - Extract `tickIntervalMs` from hello response - Clean up timer on `stop()` and WebSocket close ## Testing - [x] `pnpm build` - ✅ Success - [x] `pnpm check` - ✅ Success - [x] `pnpm test` - ✅ 270 tests passed ## AI Disclosure 🤖 AI-assisted (Claude), fully tested locally before submission. <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adds tick/heartbeat monitoring to the browser `GatewayBrowserClient` so the webchat detects silent WebSocket stalls (“zombie” connections) and forces a reconnect when no `tick` event is seen for 2× the server-provided interval. The change mirrors the existing Node gateway client’s tick watchdog and wires timer lifecycle into `stop()` and the WS `close` handler. One correctness issue remains: the browser client assigns `tickIntervalMs` from `hello.policy.tickIntervalMs` without a runtime type guard, so malformed/tampered hello frames can yield `NaN` intervals and a tight `setInterval` loop. <h3>Confidence Score: 4/5</h3> - Mostly safe to merge once the tickIntervalMs parsing is hardened against malformed hello frames. - The change is small and mirrors existing Node client behavior, but the browser variant currently trusts `hello.policy.tickIntervalMs` without a runtime type check, which can produce NaN timer intervals and pathological reconnect watchdog behavior if the hello frame is malformed/tampered. - ui/src/ui/gateway.ts <sub>Last reviewed commit: 753b69c</sub> <!-- greptile_other_comments_section --> <sub>(5/5) You can turn off certain types of comments like style [here](https://app.greptile.com/review/github)!</sub> <!-- /greptile_comment -->

Most Similar PRs