#16132: fix(cron): prevent duplicate job fires via MIN_REFIRE_GAP_MS guard
stale
size: S
trusted-contributor
Cluster:
Cron Job Stability Fixes
## fix(cron): prevent duplicate job fires via MIN_REFIRE_GAP_MS guard
Fixes #16094
### Problem
After upgrading to 2026.2.13, cron jobs fire 2-3 times at the same scheduled time instead of once. The reporter observed:
| Run | Start time | Duration |
|-----|-----------|----------|
| 1 | 16:00:00.009 | 155,721ms |
| 2 | 16:02:29.514 | 9,534ms |
Note: Run 2 started **6 seconds before** Run 1 finished — they overlapped.
### Root Cause
When a long-running isolated cron job exceeds `MAX_TIMER_DELAY_MS` (60s), the re-arm logic in `onTimer` sets a recurring 60s timer to keep the scheduler alive. When the job finishes:
1. The `finally` block sets `state.running = false` and calls `armTimer(state)`
2. `armTimer` clears `state.timer` and sets a new timer
3. **But**: a re-arm timer callback may have already fired and queued itself in Node.js's event loop macrotask queue before `clearTimeout` could cancel it
4. The queued callback runs, finds `state.running === false`, enters the main execution path
5. `findDueJobs` finds the same job still "due" (because `nextRunAtMs` hasn't been recomputed yet or was reloaded from disk with the old value)
6. The job fires again → **duplicate execution**
### Fix
Add a `MIN_REFIRE_GAP_MS` (30s) guard to `findDueJobs`, `runMissedJobs`, and `runDueJobs`: if a job's `lastRunAtMs` is more recent than the gap, skip it. This is a defense-in-depth measure that catches the race regardless of which code path triggers re-entry.
The 30s gap is conservative — cron expressions have 60s minimum granularity, so a legitimate re-fire can never happen within 30s.
### Changed Files
| File | Change |
|------|--------|
| `src/cron/service/timer.ts` | Add `MIN_REFIRE_GAP_MS` constant and guard in `findDueJobs`, `runMissedJobs`, `runDueJobs` |
| `src/cron/service.prevents-refire-within-gap.test.ts` | New test: verifies recently-run jobs are skipped, old jobs still fire |
### Testing
- All 103 cron tests pass (101 existing + 2 new)
- Lint + format pass (oxlint + oxfmt)
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Adds schedule-aware `MIN_REFIRE_GAP_MS` guard to prevent duplicate cron job executions caused by timer re-arm race conditions. The fix correctly addresses #16094 where long-running jobs could fire multiple times due to queued timer callbacks executing before `nextRunAtMs` was recomputed.
Key changes:
- Added `minRefireGapMs()` helper with schedule-aware logic: 30s for `cron` (60s granularity), `everyMs/2` capped at 30s for `every`, 0s for `at` (one-shot)
- Guards added to `findDueJobs`, `runMissedJobs`, and `runDueJobs` to skip jobs where `now - lastRunAtMs < minRefireGapMs(job)`
- Comprehensive test coverage including edge cases for short-interval `every` schedules
The implementation is defense-in-depth and doesn't rely on fixing the underlying race condition, making it robust against re-entry from any code path.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk
- The implementation is well-designed with schedule-aware logic that prevents false positives, comprehensive test coverage validates the fix including edge cases, and the defense-in-depth approach makes it robust. Previous reviewer concerns about breaking `every` schedules have been addressed by using half-interval gaps. All 103 cron tests pass.
- No files require special attention
<sub>Last reviewed commit: fe9ee91</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#18144: fix(cron): clear stuck runningAtMs after timeout and add maintenanc...
by taw0002 · 2026-02-16
84.7%
#13796: fix: skip recomputing nextRunAtMs for running cron jobs (#13739)
by echoVic · 2026-02-11
84.4%
#18191: fix(cron): prevent scheduler freeze during rapid create/delete cycl...
by BinHPdev · 2026-02-16
84.3%
#17561: fix(cron): add runtime staleness guard for runningAtMs (#17554)
by robbyczgw-cla · 2026-02-15
83.9%
#10918: fix(cron): add tolerance for timer precision and skip due jobs in r...
by Cherwayway · 2026-02-07
83.7%
#11108: fix(cron): prevent missed jobs from being skipped on timer recompute
by Bentlybro · 2026-02-07
83.2%
#12443: fix(cron): don't advance past-due jobs that haven't been executed
by rummangeminicode · 2026-02-09
82.5%
#12303: fix(cron): correct nextRunAtMs calculation and prevent timer stall
by colddonkey · 2026-02-09
82.4%
#19414: fix: respect job timeoutSeconds for stuck runningAtMs detection
by namabile · 2026-02-17
82.3%
#17838: fix: prevent cron job spin loop by not recomputing nextRunAtMs for ...
by MisterGuy420 · 2026-02-16
82.2%