#18960: fix: don't disable one-shot cron jobs on skipped status
agents
size: XS
Cluster:
Cron Job Enhancements
Fixes #18917
## Problem
One-shot cron jobs (`schedule.kind: "at"` + `deleteAfterRun: true`) are silently skipped and permanently disabled. In `applyJobResult()`, any non-`"ok"` status — including `"skipped"` — unconditionally sets `job.enabled = false`, preventing the job from ever retrying.
## Fix
Distinguish between terminal and temporary statuses for one-shot jobs:
- **`"ok"`** (without `deleteAfterRun`) → disable (already ran successfully)
- **`"error"`** → disable + log warning (permanent failure)
- **`"skipped"`** → leave `enabled: true` so the job retries on the next scheduler tick
This preserves the existing guard against tight-loop rescheduling (#11452) while allowing temporary skips to recover.
## Testing
All 145 cron tests pass including the existing one-shot job test suite.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR fixes a bug where one-shot cron jobs (`schedule.kind: "at"`) were being permanently disabled on `"skipped"` status, preventing them from retrying. The fix correctly distinguishes between terminal statuses (`"ok"` and `"error"`) that should disable the job, and temporary `"skipped"` statuses that should allow retry.
The change in `applyJobResult()` at `src/cron/service/timer.ts:92-101` now:
- Leaves `job.enabled = true` when status is `"skipped"`, allowing the job to retry on the next scheduler tick
- Only disables the job for terminal statuses (`"ok"` without `deleteAfterRun`, or `"error"`)
- Preserves the existing guard against tight-loop rescheduling (issue #11452)
The logic is sound and addresses the problem described in #18917 where jobs could be skipped due to temporary conditions like:
- Invalid payload configuration (`main job requires non-empty systemEvent text`)
- Heartbeat busy/race conditions (`requests-in-flight`)
- Missing dependencies (`isolated job requires payload.kind=agentTurn`)
All 145 cron tests pass, including existing one-shot job tests.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk
- The fix is well-targeted and follows good defensive programming practices. It adds a clear distinction between temporary and terminal failure states for one-shot jobs. The existing test suite (145 tests) all pass, providing strong coverage. The change preserves existing safeguards against tight-loop rescheduling while allowing legitimate retries for transient failures. The code is well-commented explaining the reasoning behind each branch.
- No files require special attention
<sub>Last reviewed commit: 4f0fbdb</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#3693: fix(cron): delete deleteAfterRun jobs regardless of execution status
by HirokiKobayashi-R · 2026-01-29
86.9%
#5428: fix(Cron): prevent one-shot loop on skip
by imshrishk · 2026-01-31
85.1%
#11657: fix(cron): treat skipped heartbeat as ok for one-shot jobs
by DukeDeSouth · 2026-02-08
84.7%
#18144: fix(cron): clear stuck runningAtMs after timeout and add maintenanc...
by taw0002 · 2026-02-16
82.6%
#19414: fix: respect job timeoutSeconds for stuck runningAtMs detection
by namabile · 2026-02-17
81.7%
#16132: fix(cron): prevent duplicate job fires via MIN_REFIRE_GAP_MS guard
by widingmarcus-cyber · 2026-02-14
81.3%
#8825: fix: prevent cron infinite retry loop with exponential backoff
by dbottme · 2026-02-04
81.2%
#14667: fix: preserve missed cron runs when updating job schedule
by WalterSumbon · 2026-02-12
81.1%
#12982: fix(cron): prevent status/list from advancing overdue job nextRunAtMs
by hclsys · 2026-02-10
80.3%
#8034: fix(cron): run past-due one-shot jobs immediately on startup
by FelixFoster · 2026-02-03
80.0%