← Back to PRs

#12747: fix: catch up missed cron-expression job runs on restart

by obin94-commits open 2026-02-09 15:58 View on GitHub →
stale
## Summary Cron-expression jobs (e.g. `0 6 * * *`) silently skip runs when the gateway restarts after the scheduled time. This adds catch-up detection to `recomputeNextRuns()` so missed runs fire immediately on restart. ## Problem When `recomputeNextRuns()` runs on startup, it calls `computeNextRunAtMs(schedule, now)` which returns the **next future occurrence**. If a daily 6 AM job's gateway restarts at 10 AM, the next occurrence is tomorrow 6 AM — today's run is silently skipped. No error is logged. In practice, this caused daily morning reports, agent check-ins, and data sweeps to stop firing for 3 consecutive days after a gateway restart. `every`-interval jobs are unaffected since they recalculate from anchor/now correctly. ## Fix After computing `nextRunAtMs`, check if a cron-expression job has a scheduled occurrence between `lastRunAtMs` and now that was missed. If so, set `nextRunAtMs = now` to fire the catch-up run immediately. ```typescript const missedRun = computeNextRunAtMs(job.schedule, job.state.lastRunAtMs); if (missedRun !== undefined && missedRun <= now) { job.state.nextRunAtMs = now; // fire catch-up } ``` ## Safety - Only affects `cron`-expression jobs (not `at` or `every`) - Only triggers if `lastRunAtMs` exists (job has previously run successfully) - Only catches up if there's an actual missed occurrence between last run and now - Weekly Monday jobs won't fire on Wednesday restart unless Monday was actually missed - Jobs that ran on schedule are unaffected (the computed "next after last" is in the future) ## Testing Applied this patch to a production deployment and verified: - Gateway restart correctly detects 4 missed daily jobs and fires them immediately - Log output confirms catch-up detection with job name, missed timestamp, and last run timestamp - Jobs that haven't missed a run are unaffected - `every`-interval and `at` jobs continue working as before Fixes #12744 <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR updates `recomputeNextRuns()` in `src/cron/service/jobs.ts` to detect when cron-expression jobs missed a scheduled occurrence between `lastRunAtMs` and the current startup time, and schedules an immediate run (`nextRunAtMs = now`) to catch up. It logs a structured `info` event when a catch-up is triggered. The change fits into the cron service’s existing model where `computeJobNextRunAtMs()` determines the next due time per schedule kind, while `recomputeNextRuns()` refreshes persisted job state on startup and periodic ticks. <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk. - The change is narrowly scoped to cron-expression jobs in `recomputeNextRuns()`, uses existing schedule computation (`computeNextRunAtMs`) to detect a missed occurrence, and only triggers when `lastRunAtMs` is present. I did not find any cases where it would create incorrect scheduling for `every` or `at` jobs, or cause repeated immediate re-fires once the runner updates `lastRunAtMs`. - No files require special attention <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs