← Back to PRs

#9178: Fix: GatewayClient queueConnect() setTimeout never fires

by vishaltandale00 open 2026-02-04 23:57 View on GitHub →
gateway stale
Fixes #9174 ## Problem The `queueConnect()` method in `GatewayClient` used `setTimeout(() => this.sendConnect(), 750)` to schedule the connect frame. However, on certain platforms (macOS with Tailscale) and during sleep/wake cycles, the setTimeout callback would never execute, causing nodes to establish WebSocket connections but never complete the handshake. ## Root Cause Analysis Node.js `setTimeout` can be unreliable in specific scenarios: - Remote connections over Tailscale/VPNs - System sleep/wake cycles (#9084) - Certain event loop conditions The issue reporter confirmed: - WebSocket connection established (TCP ESTABLISHED) - `queueConnect()` called and `setTimeout` scheduled - **But the callback never fired** - Event loop was running (messages were being received) The 750ms delay was originally intended to wait for the gateway's challenge message, but this is actually handled separately via `handleMessage()` which calls `queueConnect()` again with the nonce if needed. ## Solution Replaced `setTimeout` with `setImmediate`: - ✅ Executes on the next event loop iteration (no artificial delay) - ✅ More reliable than `setTimeout` in Node.js - ✅ Sufficient for proper message ordering - ✅ Challenge handling remains unchanged ## Changes 1. Replaced `setTimeout(..., 750)` with `setImmediate()` in `queueConnect()` 2. Updated `clearTimeout` calls to `clearImmediate` for correctness 3. Added type cast for `setImmediate` return value compatibility with `NodeJS.Timeout` 4. Added explanatory comments about the reliability fix ## Why setImmediate Works - `setImmediate` schedules execution on the next event loop iteration - Unlike `setTimeout`, it's not affected by system sleep or timer drift - The connect frame is sent immediately on the next tick, which is sufficient - The gateway challenge flow is handled separately and doesn't depend on this delay ## Testing Strategy - TypeScript compilation passes - No behavioral changes to the handshake protocol - `setImmediate` provides deterministic next-tick execution vs unreliable 750ms delay - Maintains full backward compatibility ## Related Issues - #9084 - setTimeout reliability issues during sleep/wake cycles - #5721 - Same symptoms: node connected but gateway shows caps/commands empty 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR updates `GatewayClient.queueConnect()` to schedule the initial `connect` request using `setImmediate` instead of a delayed `setTimeout(…, 750)`, and updates the corresponding cleanup logic to use `clearImmediate`. The intent is to avoid timer callbacks that can fail to fire in certain environments (e.g., macOS + Tailscale, sleep/wake), while keeping the challenge/nonce flow handled in `handleMessage()`. The main issue to address before merge is the handle typing: `connectTimer` is still declared as `NodeJS.Timeout | null` and the new code uses an `unknown` cast to force a `setImmediate` handle into a timeout type. That defeats type-safety and should be corrected by typing `connectTimer` to the actual immediate handle type and removing the cast. <h3>Confidence Score: 4/5</h3> - This PR is largely safe to merge once the immediate handle typing is corrected. - The behavioral change is small and localized (switching from timeout-based scheduling to immediate scheduling), but the current implementation uses an `unknown` cast to force a `setImmediate` handle into a timeout type, which removes compiler guarantees and can allow incorrect timer clearing to slip in later. - src/gateway/client.ts (connectTimer typing and casts) <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs