#23290: fix(cron): use lastRunAtMs for next schedule of interval jobs after restart

by SidQin-cyber open 2026-02-22 05:26 View on GitHub →

size: XS

## Summary - **Problem:** After a gateway restart, interval-based cron jobs (kind \`every\`) can show an unexpected "NEXT in" time. A 30-minute job that last ran 6 minutes ago may display "NEXT in 56m" instead of the expected ~24m. - **Why it matters:** Users see confusing, non-obvious scheduling and may think their cron jobs are broken. - **What changed:** In \`computeJobNextRunAtMs\` (src/cron/service/jobs.ts), when a job has a \`lastRunAtMs\` and \`lastRunAtMs + everyMs\` is still in the future, use that as the next run time instead of the anchor-based formula. - **What did NOT change:** Anchor-based scheduling is still used as fallback when \`lastRunAtMs\` is not available or \`lastRunAtMs + everyMs\` is already in the past (e.g., long downtime catch-up). Cron-expression and one-shot (\`at\`) schedules are unchanged. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #22895 ## User-visible / Behavior Changes - After gateway restart, interval jobs show the intuitive "NEXT in" time: \`everyMs - timeSinceLastRun\` - Example: 30-min job, last ran 6 min ago → "NEXT in 24m" (was showing ~56m) ## Security Impact (required) - New permissions/capabilities? \`No\` - Secrets/tokens handling changed? \`No\` - New/changed network calls? \`No\` - Command/tool execution surface changed? \`No\` - Data access scope changed? \`No\` ## Repro + Verification ### Environment - OS: macOS 15.3 (arm64) - Runtime: Node v22+ - Integration/channel: Cron service ### Steps 1. Create a cron job with \`every: 30m\` 2. Let it run at least once 3. Restart the gateway 4. Check the dashboard — "NEXT in" should reflect \`interval - timeSinceLastRun\` ### Expected - "NEXT in 24m" (if last run was 6 minutes ago) ### Actual - Before fix: "NEXT in 56m" (anchor-based formula computes non-obvious grid alignment) - After fix: "NEXT in 24m" (\`lastRunAtMs + everyMs - nowMs\`) ## Evidence The fix adds a \`lastRunAtMs\`-based fast path before the anchor calculation: \`\`\`typescript if (typeof job.state.lastRunAtMs === "number" && Number.isFinite(job.state.lastRunAtMs)) { const nextFromLast = job.state.lastRunAtMs + everyMs; if (nextFromLast > nowMs) { return nextFromLast; } } // fallback to anchor-based formula \`\`\` This ensures the interval is always measured from the last actual execution, which matches user expectations for "every N minutes". ## Human Verification (required) - Verified scenarios: Traced the scheduling flow through \`start()\` → \`runMissedJobs()\` → \`recomputeNextRuns()\`; confirmed \`lastRunAtMs\` is set by \`finishJob()\` in \`timer.ts\` - Edge cases checked: \`lastRunAtMs\` undefined (first run) → falls back to anchor; \`lastRunAtMs + everyMs\` in the past (long downtime) → falls back to anchor; disabled jobs return \`undefined\` - What I did **not** verify: Multi-day downtime catch-up behavior with many missed intervals ## Compatibility / Migration - Backward compatible? \`Yes\` — anchor-based formula is preserved as fallback - Config/env changes? \`No\` - Migration needed? \`No\` ## Failure Recovery (if this breaks) - How to disable/revert: Revert the \`lastRunAtMs\` check in \`jobs.ts\` - Files/config to restore: \`src/cron/service/jobs.ts\` - Known bad symptoms: If reverted, interval jobs may show non-intuitive "NEXT in" times after restart (existing behavior) ## Risks and Mitigations - Risk: Cumulative timing drift — using \`lastRunAtMs + everyMs\` instead of fixed grid points means execution latency accumulates over many runs - Mitigation: The anchor-based formula kicks in whenever \`lastRunAtMs + everyMs\` falls behind \`nowMs\`, naturally re-aligning to the grid Made with [Cursor](https://cursor.com)  <h3>Greptile Summary</h3> Fixes interval-based cron job scheduling after gateway restart by prioritizing `lastRunAtMs` over anchor-based grid calculations. - Adds fast-path in `computeJobNextRunAtMs` that calculates next run as `lastRunAtMs + everyMs` when available and still in future - Falls back to anchor-based formula when `lastRunAtMs` is missing (first run) or when next run from last is already past (long downtime) - Preserves backward compatibility by keeping anchor-based scheduling as fallback - Addresses user confusion where 30-minute interval jobs showed unexpected "NEXT in" times after restart (e.g., "56m" instead of expected "24m") - Implementation correctly handles edge cases: undefined `lastRunAtMs`, non-finite values, disabled jobs, and catch-up scenarios <h3>Confidence Score: 5/5</h3> - Safe to merge with minimal risk - The fix is well-scoped to interval jobs, maintains backward compatibility through fallback logic, correctly handles all edge cases (undefined/non-finite values, first run, long downtime), and uses defensive programming practices (type checks, Number.isFinite, bounds checking). The fast-path optimization is sound and the anchor-based formula is preserved as a safety net. - No files require special attention <sub>Last reviewed commit: 9b034a9</sub>