#17643: fix: clear stale runningAtMs in cron.run to allow manual triggers
stale
size: XS
Cluster:
Cron Job Stability Fixes
## Summary
When a cron job completes but its runningAtMs timestamp is not cleared (e.g., due to process restart during execution), manual trigger via cron.run incorrectly returns {"ran": false, "reason": "already-running"}. This fix adds a staleness check: if runningAtMs is older than 2 hours, it is treated as stale and cleared, allowing the manual trigger to proceed.
## Changes
- Added STALE_RUNNING_AT_MS_THRESHOLD_MS constant (2 hours) in ops.ts
- Modified run() function to check if runningAtMs is stale before returning already-running
- If stale, clears runningAtMs and logs a warning, then proceeds with the run
- If not stale (recent), returns already-running as before
## Testing
- All existing cron tests pass (10 tests in service.issue-regressions.test.ts)
- The fix follows the same pattern as the existing STUCK_RUN_MS logic in jobs.ts
Fixes openclaw/openclaw#17554
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Adds a staleness check to `cron.run()` so that a stale `runningAtMs` timestamp (older than 2 hours) no longer blocks manual job triggers. When a cron job's execution is interrupted (e.g., process restart), the `runningAtMs` marker can be left behind, causing `cron.run` to incorrectly return `"already-running"`. This fix mirrors the existing `STUCK_RUN_MS` / `normalizeJobTickState` pattern in `jobs.ts`.
- Added `STALE_RUNNING_AT_MS_THRESHOLD_MS` constant (2 hours) matching `STUCK_RUN_MS` in `jobs.ts`
- Modified `run()` to detect stale `runningAtMs` and clear it with a warning log before proceeding
- Recent (non-stale) `runningAtMs` still correctly returns `"already-running"`
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge — it's a narrowly scoped fix that follows an established pattern in the codebase.
- The change is minimal (one file, ~20 lines), follows the existing STUCK_RUN_MS / normalizeJobTickState pattern from jobs.ts, and introduces no new failure modes. The staleness threshold matches the existing 2-hour constant. The lock mechanism ensures no race conditions within the process. The only known minor gap (stale clearance not persisted on early "not-due" return) is low-risk and was already discussed in prior review threads.
- No files require special attention
<sub>Last reviewed commit: 4672c7d</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#17895: fix(cron): add staleness check for runningAtMs on manual trigger
by PlayerGhost · 2026-02-16
91.8%
#17664: fix(cron): detect and clear stale runningAtMs marker in manual run ...
by echoVic · 2026-02-16
91.6%
#17561: fix(cron): add runtime staleness guard for runningAtMs (#17554)
by robbyczgw-cla · 2026-02-15
89.4%
#18144: fix(cron): clear stuck runningAtMs after timeout and add maintenanc...
by taw0002 · 2026-02-16
87.9%
#17949: fix: clear stale runningAtMs in cron.run() before already-running c...
by yasumorishima · 2026-02-16
87.9%
#18192: fix(cron): auto-clear stale runningAtMs markers after timeout (#18120)
by BinHPdev · 2026-02-16
87.3%
#19414: fix: respect job timeoutSeconds for stuck runningAtMs detection
by namabile · 2026-02-17
86.1%
#12018: fix(cron): clear stale running markers based on job timeout
by benzer25 · 2026-02-08
85.2%
#5179: fix(cron): recover stale running markers
by thatdaveb · 2026-01-31
82.9%
#12443: fix(cron): don't advance past-due jobs that haven't been executed
by rummangeminicode · 2026-02-09
81.6%