#7022: fix(cron): prevent schedule drift on gateway restart for 'every' jobs
Cluster:
Cron Job Management Fixes
## Problem
Jobs with `kind: 'every'` schedules drift forward on gateway restarts:
1. Job created at 23:40 with 4h interval → first run scheduled for 03:40
2. Gateway restarts at 03:29 (11 min before scheduled run)
3. Schedule recalculated using restart time as anchor → next run at 07:29
4. Job never runs at the originally intended time
**Root cause:** `anchorMs` defaults to `nowMs` in `computeNextRunAtMs()` when not provided. On restart, `nowMs` is the restart time, not the creation time.
## Solution
Two fixes:
1. **Persist anchor at creation**: In `createJob()`, set `schedule.anchorMs` to creation time for 'every' schedules when not explicitly provided.
2. **Catch-up missed jobs**: In `recomputeNextRuns()`, detect if a job should have run (based on `lastRunAtMs` or `createdAtMs`) but hasn't, and schedule it immediately.
## Tests
Added `jobs.anchor-fix.test.ts` covering:
- anchorMs auto-set on job creation
- anchorMs preserved when explicitly provided
- Catch-up scheduling for missed jobs
- No false catch-up when job ran recently
All existing cron tests pass (56/56).
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR addresses drift for `kind: "every"` cron jobs across gateway restarts by (1) persisting `schedule.anchorMs` at job creation when not provided and (2) adding a catch-up path in `recomputeNextRuns()` intended to schedule missed intervals immediately.
The anchor persistence in `src/cron/service/jobs.ts` fits well with the existing `computeNextRunAtMs()` semantics (which otherwise default `anchorMs` to `nowMs`). However, the new catch-up logic appears ineffective for the real “restart before a scheduled run” scenario because `computeNextRunAtMs()` always returns a timestamp >= `now`, so comparing `nextRunAtMs` to `now` doesn’t reliably detect missed runs.
Tests were added to cover anchor behavior and catch-up scheduling, but the catch-up test currently uses an interval-boundary restart time, which doesn’t reflect the drift scenario described in the PR and may pass even without catch-up behavior.
<h3>Confidence Score: 2/5</h3>
- Not safe to merge as-is due to likely non-functional catch-up behavior for missed `every` runs.
- Anchor persistence looks correct and low risk, but the added catch-up condition appears to not trigger for the intended restart-drift case, and the associated test doesn’t exercise the realistic scenario. This could leave the bug partially fixed (or create a false sense of coverage).
- src/cron/service/jobs.ts (catch-up logic), src/cron/service/jobs.anchor-fix.test.ts (catch-up test scenario)
<!-- greptile_other_comments_section -->
<sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#23290: fix(cron): use lastRunAtMs for next schedule of interval jobs after...
by SidQin-cyber · 2026-02-22
86.4%
#13065: fix(cron): Fix "every" schedule not re-arming after gateway restart
by trevorgordon981 · 2026-02-10
84.3%
#12747: fix: catch up missed cron-expression job runs on restart
by obin94-commits · 2026-02-09
84.3%
#9060: Fix: Preserve scheduled cron jobs after gateway restart
by vishaltandale00 · 2026-02-04
83.8%
#22948: fix(cron): every-schedule boundary returns nowMs instead of next sl...
by echoVic · 2026-02-21
83.6%
#22911: fix(cron): correct next execution time calculation after gateway re...
by anandsuraj · 2026-02-21
81.8%
#11857: fix: recompute stale cron nextRunAtMs on gateway restart
by Yida-Dev · 2026-02-08
81.5%
#18925: fix(cron): stagger missed jobs on restart to prevent gateway overload
by rexlunae · 2026-02-17
80.7%
#8034: fix(cron): run past-due one-shot jobs immediately on startup
by FelixFoster · 2026-02-03
80.5%
#18144: fix(cron): clear stuck runningAtMs after timeout and add maintenanc...
by taw0002 · 2026-02-16
80.0%