← Back to PRs

#11974: [FEATURE] feat: integrate systemd WatchdogSec for gateway hang detection

by mcaxtr open 2026-02-08 16:52 View on GitHub →
gateway cli commands size: L trusted-contributor experienced-contributor
## Summary - Add `sd-notify` wrapper (`src/infra/sd-notify.ts`) that sends `READY=1` on startup and `WATCHDOG=1` heartbeats every 30s via the existing tick interval - Update the generated systemd unit with `Type=notify`, `NotifyAccess=all`, and `WatchdogSec=90` so systemd automatically kills and restarts the gateway if the event loop freezes - All calls are no-ops when `$NOTIFY_SOCKET` is unset (macOS, manual dev, tests) Fixes #11973 Discussion: https://github.com/openclaw/openclaw/discussions/12026 ## Review fixes Addressed functional regressions flagged during code review: - **[P1] Chat routing**: Restored canonical session key destructuring from `loadSessionEntry()` so `resolveSessionAgentId` and `resolveSendPolicy` use the resolved key instead of raw aliases (`chat.ts`) - **[P1] iOS disconnect recovery**: Restored `handleChannelDisconnected` to call `resetConnectionState()` before `onDisconnected`, so `notifyConnectedIfNeeded()` re-fires after reconnect (`GatewayNodeSession.swift`) - **[P2] UI state clobbering**: Added connected short-circuit in the reconnection loop so a healthy connection isn't repeatedly shown as "Connecting…" (`NodeAppModel.swift`) - **[P2] Challenge timeout**: Restored `connectChallengeTimeoutSeconds` from 0.75s to 3.0s to avoid nonce handshake failures on remote/Tailscale links (`GatewayChannel.swift`) ## Test plan - [x] `sdNotifyReady()` is a no-op when `NOTIFY_SOCKET` is not set - [x] `sdNotifyReady()` calls `systemd-notify --ready` when `NOTIFY_SOCKET` is set - [x] `sdNotifyWatchdog()` is a no-op when `NOTIFY_SOCKET` is not set - [x] `sdNotifyWatchdog()` calls `systemd-notify WATCHDOG=1` when `NOTIFY_SOCKET` is set - [x] Generated systemd unit includes `Type=notify` - [x] Generated systemd unit includes `NotifyAccess=all` - [x] Generated systemd unit includes `WatchdogSec=90` - [x] All directives placed in `[Service]` section - [x] All 8 new tests fail before implementation, pass after - [x] `pnpm build` compiles cleanly - [x] `pnpm check` passes (types, lint, format) <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adds a new `sd-notify` wrapper (`src/infra/sd-notify.ts`) and integrates it into the gateway startup/shutdown path to support `Type=notify` readiness notifications and periodic `WATCHDOG=1` heartbeats. It also threads an opt-in `watchdog` flag through gateway systemd installation/update flows so generated user units include `Type=notify`, `NotifyAccess=all`, and `WatchdogSec=90`, and adjusts systemd unit naming resolution to respect `OPENCLAW_SYSTEMD_UNIT` consistently during install/uninstall. <h3>Confidence Score: 5/5</h3> - This PR looks safe to merge with minimal risk. - Reviewed the diffs around systemd unit generation, unit name resolution, and gateway startup/shutdown integration. The sd-notify helper is gated on NOTIFY_SOCKET/WATCHDOG_USEC, watchdog calls are cleaned up on shutdown, and install/uninstall now consistently respect OPENCLAW_SYSTEMD_UNIT to avoid competing units. No definite functional regressions were found in the changed code paths. - No files require special attention <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs