#18960: fix: don't disable one-shot cron jobs on skipped status

by jwchmodx open 2026-02-17 06:41 View on GitHub →

agents size: XS

Fixes #18917 ## Problem One-shot cron jobs (`schedule.kind: "at"` + `deleteAfterRun: true`) are silently skipped and permanently disabled. In `applyJobResult()`, any non-`"ok"` status — including `"skipped"` — unconditionally sets `job.enabled = false`, preventing the job from ever retrying. ## Fix Distinguish between terminal and temporary statuses for one-shot jobs: - **`"ok"`** (without `deleteAfterRun`) → disable (already ran successfully) - **`"error"`** → disable + log warning (permanent failure) - **`"skipped"`** → leave `enabled: true` so the job retries on the next scheduler tick This preserves the existing guard against tight-loop rescheduling (#11452) while allowing temporary skips to recover. ## Testing All 145 cron tests pass including the existing one-shot job test suite.  <h3>Greptile Summary</h3> This PR fixes a bug where one-shot cron jobs (`schedule.kind: "at"`) were being permanently disabled on `"skipped"` status, preventing them from retrying. The fix correctly distinguishes between terminal statuses (`"ok"` and `"error"`) that should disable the job, and temporary `"skipped"` statuses that should allow retry. The change in `applyJobResult()` at `src/cron/service/timer.ts:92-101` now: - Leaves `job.enabled = true` when status is `"skipped"`, allowing the job to retry on the next scheduler tick - Only disables the job for terminal statuses (`"ok"` without `deleteAfterRun`, or `"error"`) - Preserves the existing guard against tight-loop rescheduling (issue #11452) The logic is sound and addresses the problem described in #18917 where jobs could be skipped due to temporary conditions like: - Invalid payload configuration (`main job requires non-empty systemEvent text`) - Heartbeat busy/race conditions (`requests-in-flight`) - Missing dependencies (`isolated job requires payload.kind=agentTurn`) All 145 cron tests pass, including existing one-shot job tests. <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk - The fix is well-targeted and follows good defensive programming practices. It adds a clear distinction between temporary and terminal failure states for one-shot jobs. The existing test suite (145 tests) all pass, providing strong coverage. The change preserves existing safeguards against tight-loop rescheduling while allowing legitimate retries for transient failures. The code is well-commented explaining the reasoning behind each branch. - No files require special attention <sub>Last reviewed commit: 4f0fbdb</sub>