#13796: fix: skip recomputing nextRunAtMs for running cron jobs (#13739)

by echoVic open 2026-02-11 01:55 View on GitHub →

## fix: skip recomputing nextRunAtMs for running cron jobs (#13739) ### Problem Daily cron jobs systematically skip their scheduled execution, with `nextRunAtMs` advancing 48 hours instead of 24 hours. ### Root Cause `recomputeNextRuns()` does not skip jobs that are currently running (`runningAtMs` is set). When a concurrent operation (e.g., `list`, `status`, `add`) triggers `recomputeNextRuns` while a job is executing, the function sees the job's `nextRunAtMs` as past-due and advances it to the next period (e.g., tomorrow). This effectively skips the current execution's post-run scheduling in `applyJobResult`. The race condition: 1. `onTimer` finds a due job, sets `runningAtMs`, persists, releases lock 2. Job starts executing (outside the lock) 3. Another operation (e.g., `list`) acquires the lock, calls `recomputeNextRuns` 4. `recomputeNextRuns` sees `nextRunAtMs` is past-due, advances it to tomorrow 5. `onTimer` finishes, `applyJobResult` computes `nextRunAtMs` = tomorrow (same value) 6. Result: job ran today but `nextRunAtMs` jumped to day-after-tomorrow ### Fix Add a guard in `recomputeNextRuns` to skip jobs where `runningAtMs` is set (i.e., currently executing). The post-run scheduling in `applyJobResult` will handle setting the correct `nextRunAtMs` after execution completes. Closes #13739  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> Adds a guard in `recomputeNextRuns` to skip jobs with `runningAtMs` set, preventing a race condition where concurrent operations (`list`, `status`, `add`) could advance `nextRunAtMs` while a job is executing, causing the job to skip its next scheduled run (e.g., 48-hour gap instead of 24 hours for daily jobs). **Key changes:** - Adds check at `src/cron/service/jobs.ts:128-130` to skip recomputation for running jobs - Guard is placed after stuck-run cleanup but before the recompute logic - Post-run scheduling in `applyJobResult` handles setting correct `nextRunAtMs` after execution **How it works:** The fix prevents the race condition where: 1. `onTimer` sets `runningAtMs` and releases lock 2. Job executes (outside lock) 3. Another operation acquires lock, calls `recomputeNextRuns` 4. Without this fix: `recomputeNextRuns` sees past-due `nextRunAtMs` and advances it 5. With this fix: `recomputeNextRuns` skips the job, letting `applyJobResult` schedule correctly after execution The guard is consistent with existing patterns used in `findDueJobs`, `runMissedJobs`, `runDueJobs`, and the `run` operation. <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk - The fix is surgical, well-documented, and follows existing patterns throughout the codebase. The guard is placed correctly after stuck-run cleanup, ensuring genuinely stuck jobs are still recovered. The change addresses a clear race condition with a minimal, defensive check that aligns with how `runningAtMs` is already used in multiple other locations (`findDueJobs`, `runMissedJobs`, `runDueJobs`, `isJobDue`, and the `run` operation). - No files require special attention  <sub>(4/5) You can add custom instructions or style guidelines for the agent [here](https://app.greptile.com/review/github)!</sub>