#10918: fix(cron): add tolerance for timer precision and skip due jobs in recompute
stale
Cluster:
Cron Job Management Fixes
## Problem
`every` and `cron` type jobs never fire because of a race condition:
1. Timer is armed for time T (nextRunAtMs)
2. Timer fires at T+ε (or T-ε due to JS timer precision/clock drift)
3. `ensureLoaded()` reads jobs.json
4. `runDueJobs()` checks `now >= nextRunAtMs` - may be false if timer fired early
5. `recomputeNextRuns()` advances `nextRunAtMs` to T+interval
6. Job is perpetually pushed forward, never executes
In my testing, I observed timer firing ~900ms early, causing the due check to fail.
## Solution
1. **Add 2-second tolerance** in `runDueJobs()`: `now >= next - DUE_TOLERANCE_MS`
2. **Skip recomputing due jobs** in `recomputeNextRuns()` - let `runDueJobs()` handle them first
3. **Re-arm timer on early return** when `state.running` is true
4. **Run `runDueJobs()` twice**: before and after recompute (from PR #10412)
5. **Catch up overdue jobs on startup** (from PR #10412)
## Changes
- `timer.ts`: Add `DUE_TOLERANCE_MS = 2000`, re-arm on early return, double `runDueJobs()` call
- `jobs.ts`: Skip jobs where `now >= oldNext - 2000` in `recomputeNextRuns()`
- `ops.ts`: `collectOverdueJobIds()` and `runOverdueJobsOnStartup()` for gateway restart
## Testing
Tested with 2-minute interval jobs on macOS - now fires reliably every cycle.
Before fix: Job never executed, `nextRunAtMs` kept advancing
After fix: Job executes on schedule, `lastRunAtMs` updates correctly
Fixes #10653
Related to PR #10412
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
- Adjusts cron scheduler timing to tolerate early timer ticks by adding a 2s due-time tolerance and avoiding recompute for already-due jobs.
- Updates timer tick flow to run due jobs, recompute next runs, then run due jobs again, and re-arm the timer when a previous tick is still running.
- Adds startup logic to detect jobs whose persisted `nextRunAtMs` is already past and executes each overdue job once before persisting and arming the timer.
<h3>Confidence Score: 3/5</h3>
- This PR is likely safe but has correctness edge cases around overdue job execution and duplicated due-time tolerance semantics.
- Core timing fixes are localized and consistent with existing scheduler flow, but startup catch-up can execute jobs that are no longer truly due after schedule edits, and the due tolerance is duplicated as a magic number in another module which risks future divergence.
- src/cron/service/ops.ts and src/cron/service/jobs.ts
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#11108: fix(cron): prevent missed jobs from being skipped on timer recompute
by Bentlybro · 2026-02-07
90.0%
#12443: fix(cron): don't advance past-due jobs that haven't been executed
by rummangeminicode · 2026-02-09
87.6%
#12448: fix: prevent cron list/status from silently skipping due jobs
by Yida-Dev · 2026-02-09
86.0%
#12303: fix(cron): correct nextRunAtMs calculation and prevent timer stall
by colddonkey · 2026-02-09
85.0%
#9684: fix: cron race condition - run due jobs before recomputing nextRunA...
by divol89 · 2026-02-05
84.7%
#12122: fix(cron): ensure timer callback fires for scheduled jobs
by divol89 · 2026-02-08
84.6%
#13796: fix: skip recomputing nextRunAtMs for running cron jobs (#13739)
by echoVic · 2026-02-11
84.4%
#9393: fix(cron): avoid recomputeNextRuns on forceReload
by matthewpapa07 · 2026-02-05
84.2%
#8034: fix(cron): run past-due one-shot jobs immediately on startup
by FelixFoster · 2026-02-03
84.2%
#8379: fix(cron): handle past-due one-shot 'at' jobs that haven't run yet
by Gerrald12312 · 2026-02-04
84.0%