← Back to PRs

#11752: fix(heartbeat): clamp setTimeout delay to 2^31-1 to prevent TimeoutOverflowWarning

by kjaylee open 2026-02-08 08:03 View on GitHub →
stale
## Problem When a cron job's `nextDue` timestamp is far in the future (e.g. `at`-type jobs scheduled months/years out), the computed `setTimeout` delay exceeds the 32-bit signed integer maximum (`2^31 - 1 = 2,147,483,647 ms ≈ 24.8 days`). Node.js silently clamps such values to `1ms`, emitting a `TimeoutOverflowWarning`, which causes the heartbeat timer to fire immediately and re-schedule in a tight loop. This results in: - **`gateway.err.log` growing at ~1 GB/hour** (TimeoutOverflowWarning spam) - **CPU pegged at 50%+** due to the infinite re-scheduling loop - **Disk exhaustion** (observed: 43 GB log file before detection) ## Root Cause ```ts // src/infra/heartbeat-runner.ts:910 const delay = Math.max(0, nextDue - now); // When nextDue - now > 2^31-1, Node.js treats it as 1ms → infinite loop ``` ## Fix Clamp the delay to the maximum safe `setTimeout` value: ```ts const MAX_TIMEOUT = 2_147_483_647; // setTimeout max (2^31 - 1) const delay = Math.min(Math.max(0, nextDue - now), MAX_TIMEOUT); ``` This ensures the timer fires at most ~24.8 days later, at which point `scheduleNext` recalculates and re-schedules correctly. ## Impact - **Before**: Any `at`-type cron job >24.8 days in the future triggers infinite log spam - **After**: Timer safely caps at ~24.8 days and re-evaluates on wake ## Verified Tested on production gateway (v2026.2.6-3). After patching the compiled bundles and doing a full process restart: - `gateway.err.log`: 662K lines → 18 lines (1.0 KB) - CPU: 56% → 7.8% - Zero `TimeoutOverflowWarning` emissions <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This change updates the heartbeat runner’s scheduling logic to clamp the `setTimeout` delay to Node.js’ maximum supported timeout (`2^31 - 1` ms). It prevents `TimeoutOverflowWarning` and the resulting tight reschedule loop when the next due timestamp is far in the future, while preserving the existing behavior of recomputing the next due time on wake. <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk. - The change is localized to one scheduling calculation, matches Node.js timeout constraints, and does not alter control flow beyond preventing overflow-induced immediate timers. No other code paths depend on the unclamped larger delay value. - src/infra/heartbeat-runner.ts <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs