← Back to PRs

#10918: fix(cron): add tolerance for timer precision and skip due jobs in recompute

by Cherwayway open 2026-02-07 05:34 View on GitHub →
stale
## Problem `every` and `cron` type jobs never fire because of a race condition: 1. Timer is armed for time T (nextRunAtMs) 2. Timer fires at T+ε (or T-ε due to JS timer precision/clock drift) 3. `ensureLoaded()` reads jobs.json 4. `runDueJobs()` checks `now >= nextRunAtMs` - may be false if timer fired early 5. `recomputeNextRuns()` advances `nextRunAtMs` to T+interval 6. Job is perpetually pushed forward, never executes In my testing, I observed timer firing ~900ms early, causing the due check to fail. ## Solution 1. **Add 2-second tolerance** in `runDueJobs()`: `now >= next - DUE_TOLERANCE_MS` 2. **Skip recomputing due jobs** in `recomputeNextRuns()` - let `runDueJobs()` handle them first 3. **Re-arm timer on early return** when `state.running` is true 4. **Run `runDueJobs()` twice**: before and after recompute (from PR #10412) 5. **Catch up overdue jobs on startup** (from PR #10412) ## Changes - `timer.ts`: Add `DUE_TOLERANCE_MS = 2000`, re-arm on early return, double `runDueJobs()` call - `jobs.ts`: Skip jobs where `now >= oldNext - 2000` in `recomputeNextRuns()` - `ops.ts`: `collectOverdueJobIds()` and `runOverdueJobsOnStartup()` for gateway restart ## Testing Tested with 2-minute interval jobs on macOS - now fires reliably every cycle. Before fix: Job never executed, `nextRunAtMs` kept advancing After fix: Job executes on schedule, `lastRunAtMs` updates correctly Fixes #10653 Related to PR #10412 <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> - Adjusts cron scheduler timing to tolerate early timer ticks by adding a 2s due-time tolerance and avoiding recompute for already-due jobs. - Updates timer tick flow to run due jobs, recompute next runs, then run due jobs again, and re-arm the timer when a previous tick is still running. - Adds startup logic to detect jobs whose persisted `nextRunAtMs` is already past and executes each overdue job once before persisting and arming the timer. <h3>Confidence Score: 3/5</h3> - This PR is likely safe but has correctness edge cases around overdue job execution and duplicated due-time tolerance semantics. - Core timing fixes are localized and consistent with existing scheduler flow, but startup catch-up can execute jobs that are no longer truly due after schedule edits, and the due tolerance is duplicated as a magic number in another module which risks future divergence. - src/cron/service/ops.ts and src/cron/service/jobs.ts <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs