← Back to PRs

#22411: fix(cron): cancel timed-out runs before side effects

by Takhoffman open 2026-02-21 04:15 View on GitHub →
gateway agents maintainer size: M
## Summary Describe the problem and fix in 2–5 bullets: - Problem: Cron timer timeouts marked runs as failed, but isolated execution could still finish and emit downstream side effects (announce/main-session summary). - Why it matters: This can produce inconsistent outcomes (error state + late delivery) and duplicate/confusing notifications. - What changed: Added abort-signal plumbing from timer timeout into cron isolated execution path and guard checks before post-run side effects; wired signal through gateway cron service. - What did NOT change (scope boundary): No scheduler concurrency behavior changes; no CLI/API surface changes beyond internal optional signal threading. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [x] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes # - Related #20217 - Related #18040 - Related #18120 - Related #16841 - Related PR #16880 ## User-visible / Behavior Changes - Timed-out cron isolated runs are now prevented from emitting late follow-up side effects after timeout. - Reduces cases where users see timeout/error while delayed follow-up delivery still appears later. ## Security Impact (required) - New permissions/capabilities? (`Yes/No`) No - Secrets/tokens handling changed? (`Yes/No`) No - New/changed network calls? (`Yes/No`) No - Command/tool execution surface changed? (`Yes/No`) No - Data access scope changed? (`Yes/No`) No - If any `Yes`, explain risk + mitigation: ## Repro + Verification ### Environment - OS: macOS (darwin-arm64) - Runtime/container: Node + Vitest - Model/provider: N/A (unit/integration tests) - Integration/channel (if any): Cron service internals - Relevant config (redacted): default test config ### Steps 1. Create an isolated cron job with a very small timeout and delayed runner completion. 2. Trigger scheduler tick. 3. Confirm timed-out status and absence of follow-up enqueue side effects. ### Expected - Run is marked timed out/error and does not emit late follow-up side effects. ### Actual - Matches expected after fix. ## Evidence Attach at least one: - [x] Failing test/log before + passing after - [ ] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) ## Human Verification (required) What you personally verified (not just CI), and how: - Verified scenarios: - `pnpm vitest run src/cron/service.issue-regressions.test.ts src/cron/service.rearm-timer-when-running.test.ts src/cron/service.restart-catchup.test.ts --maxWorkers=1` - `pnpm vitest run src/cron/service.delivery-plan.test.ts src/cron/service.runs-one-shot-main-job-disables-it.test.ts --maxWorkers=1` - Edge cases checked: - timeoutSeconds=0 semantics remain intact - existing timer rearm/restart catch-up behavior still passes - What you did **not** verify: - Full live provider/channel end-to-end with real network delivery ## Compatibility / Migration - Backward compatible? (`Yes/No`) Yes - Config/env changes? (`Yes/No`) No - Migration needed? (`Yes/No`) No - If yes, exact upgrade steps: ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: - Revert this PR commit from cron timeout-cancellation branch. - Files/config to restore: - `src/cron/service/state.ts`, `src/cron/service/timer.ts`, `src/cron/isolated-agent/run.ts`, `src/gateway/server-cron.ts`, `src/cron/service.issue-regressions.test.ts` - Known bad symptoms reviewers should watch for: - Timed-out jobs still emitting late announce/main-session summaries. ## Risks and Mitigations List only real risks for this PR. Add/remove entries as needed. If none, write `None`. - Risk: - Over-cancellation could suppress legitimate post-run side effects in borderline timeout races. - Mitigation: - Abort checks are limited to timeout path and covered with a dedicated regression test for side-effect suppression after timeout. <!-- greptile_comment --> <h3>Greptile Summary</h3> Adds `AbortSignal` plumbing through the cron execution stack to prevent timed-out jobs from emitting late side effects (system events, announces) after timeout. The timeout controller now aborts the signal when jobs exceed their `timeoutSeconds` limit, and abort checks guard the main post-run side effects (`enqueueSystemEvent`, `requestHeartbeatNow`) in `src/cron/service/timer.ts:554-581`. **Key changes:** - Threads optional `signal?: AbortSignal` parameter through `runIsolatedAgentJob` → `runCronIsolatedAgentTurn` - Adds abort checks at two critical points in `isolated-agent/run.ts:514` (after agent execution) and `:574` (before side effects logic) - Guards post-run side effects in `timer.ts:554` (before `enqueueSystemEvent`) and `:502` (in heartbeat loop) - Normalizes timeout error messages via `timeoutErrorMessage()` helper **Test coverage:** - New regression test validates that `enqueueSystemEvent` is not called when timeout fires - Existing timer/restart/delivery tests continue to pass <h3>Confidence Score: 3/5</h3> - Safe to merge with moderate caution - improves timeout handling but has edge case gaps - The PR correctly implements abort signal plumbing and guards the critical post-run side effects (`enqueueSystemEvent`) in `timer.ts`. However, there are two potential race conditions in `isolated-agent/run.ts` where in-run delivery side effects (`deliverOutboundPayloads:654`, `runSubagentAnnounceFlow:747`) could execute after timeout if the timeout fires during their execution. The test only validates post-run behavior, not these in-run cases. This is a partial fix that significantly improves the situation but doesn't fully eliminate late deliveries. - Pay close attention to `src/cron/isolated-agent/run.ts` lines 654 and 747 - verify whether in-run delivery side effects should be guarded with abort checks <sub>Last reviewed commit: 76fd2f8</sub> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs