#11522: Fix #10904: Add hard timeout to lane tasks to prevent cron wedging
channel: signal
channel: telegram
app: web-ui
gateway
cli
agents
stale
Cluster:
Cron Scheduler Improvements
## Problem
The cron scheduler lane wedges when a task hangs indefinitely. The `state.active` counter never decrements, blocking all subsequent jobs.
## Root Cause
Lane tasks execute without any timeout. If a cron job (e.g., isolated agent turn) gets stuck waiting for model response, exec completion, or network I/O, the lane remains "active" forever.
## Fix
Add a 5-minute hard timeout via `Promise.race` to ensure wedged tasks fail with an error instead of blocking the lane forever.
## Changes
- Added `TASK_TIMEOUT_MS = 300_000` constant (5 minutes)
- Wrapped `entry.task()` in `Promise.race` with timeout
- Tasks that exceed the timeout throw and decrement `state.active`
Fixes #10904
Wallet: BYCgQQpJT1odaunfvk6gtm5hVd7Xu93vYwbumFfqgHb3
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR makes cron scheduling and related subsystems more robust by (1) adding a hard timeout around lane task execution to prevent the cron lane from wedging permanently, and (2) tightening/expanding a few configuration and delivery behaviors (cron delivery fields, optional provider baseUrl defaults, per-agent heartbeat model resolution, and some UI markdown performance limits). It also adjusts cron store/timer loading so the timer tick uses persisted `nextRunAtMs` for determining due jobs, then recomputes next runs after executing due jobs, and includes small fixes in Signal/Telegram/TTS/gateway plumbing.
Overall direction is sound, but there are a couple of correctness issues that can affect runtime behavior (timer leak in the new lane timeout wrapper; and edit message deduplication producing `"undefined"` IDs).
<h3>Confidence Score: 3/5</h3>
- This PR is close to safe to merge but has a couple of concrete runtime issues to address first.
- Most changes are straightforward and align with the stated goal, but the new lane timeout wrapper introduces an uncleared `setTimeout` per task (resource leak) and the Signal edit deduplication can emit a literal "undefined" messageId, which can break downstream dedupe. Fixing these should materially reduce risk.
- src/process/command-queue.ts, src/signal/monitor/event-handler.ts
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#13055: fix: prevent cron RPC stalls with timeout and caching (#13018)
by trevorgordon981 · 2026-02-10
81.6%
#12086: fix(cron): ensure timer callback fires for scheduled jobs
by divol89 · 2026-02-08
81.0%
#18144: fix(cron): clear stuck runningAtMs after timeout and add maintenanc...
by taw0002 · 2026-02-16
81.0%
#22411: fix(cron): cancel timed-out runs before side effects
by Takhoffman · 2026-02-21
78.3%
#6302: fix: Add timeouts to prevent indefinite hangs (issues #4954, #4956,...
by batumilove · 2026-02-01
77.8%
#8698: fix(cron): default enabled to true for new jobs
by emmick4 · 2026-02-04
77.4%
#16880: fix(cron): respect per-job timeoutSeconds in executeJob path (#16841)
by echoVic · 2026-02-15
77.2%
#10829: fix: prevent cron scheduler permanent death on transient startup/ru...
by meaadore1221-afk · 2026-02-07
77.2%
#12131: fix(cron): ensure timer callback fires for scheduled jobs
by divol89 · 2026-02-08
76.9%
#19414: fix: respect job timeoutSeconds for stuck runningAtMs detection
by namabile · 2026-02-17
76.4%