#11974: [FEATURE] feat: integrate systemd WatchdogSec for gateway hang detection
gateway
cli
commands
size: L
trusted-contributor
experienced-contributor
## Summary
- Add `sd-notify` wrapper (`src/infra/sd-notify.ts`) that sends `READY=1` on startup and `WATCHDOG=1` heartbeats every 30s via the existing tick interval
- Update the generated systemd unit with `Type=notify`, `NotifyAccess=all`, and `WatchdogSec=90` so systemd automatically kills and restarts the gateway if the event loop freezes
- All calls are no-ops when `$NOTIFY_SOCKET` is unset (macOS, manual dev, tests)
Fixes #11973
Discussion: https://github.com/openclaw/openclaw/discussions/12026
## Review fixes
Addressed functional regressions flagged during code review:
- **[P1] Chat routing**: Restored canonical session key destructuring from `loadSessionEntry()` so `resolveSessionAgentId` and `resolveSendPolicy` use the resolved key instead of raw aliases (`chat.ts`)
- **[P1] iOS disconnect recovery**: Restored `handleChannelDisconnected` to call `resetConnectionState()` before `onDisconnected`, so `notifyConnectedIfNeeded()` re-fires after reconnect (`GatewayNodeSession.swift`)
- **[P2] UI state clobbering**: Added connected short-circuit in the reconnection loop so a healthy connection isn't repeatedly shown as "Connecting…" (`NodeAppModel.swift`)
- **[P2] Challenge timeout**: Restored `connectChallengeTimeoutSeconds` from 0.75s to 3.0s to avoid nonce handshake failures on remote/Tailscale links (`GatewayChannel.swift`)
## Test plan
- [x] `sdNotifyReady()` is a no-op when `NOTIFY_SOCKET` is not set
- [x] `sdNotifyReady()` calls `systemd-notify --ready` when `NOTIFY_SOCKET` is set
- [x] `sdNotifyWatchdog()` is a no-op when `NOTIFY_SOCKET` is not set
- [x] `sdNotifyWatchdog()` calls `systemd-notify WATCHDOG=1` when `NOTIFY_SOCKET` is set
- [x] Generated systemd unit includes `Type=notify`
- [x] Generated systemd unit includes `NotifyAccess=all`
- [x] Generated systemd unit includes `WatchdogSec=90`
- [x] All directives placed in `[Service]` section
- [x] All 8 new tests fail before implementation, pass after
- [x] `pnpm build` compiles cleanly
- [x] `pnpm check` passes (types, lint, format)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR adds a new `sd-notify` wrapper (`src/infra/sd-notify.ts`) and integrates it into the gateway startup/shutdown path to support `Type=notify` readiness notifications and periodic `WATCHDOG=1` heartbeats. It also threads an opt-in `watchdog` flag through gateway systemd installation/update flows so generated user units include `Type=notify`, `NotifyAccess=all`, and `WatchdogSec=90`, and adjusts systemd unit naming resolution to respect `OPENCLAW_SYSTEMD_UNIT` consistently during install/uninstall.
<h3>Confidence Score: 5/5</h3>
- This PR looks safe to merge with minimal risk.
- Reviewed the diffs around systemd unit generation, unit name resolution, and gateway startup/shutdown integration. The sd-notify helper is gated on NOTIFY_SOCKET/WATCHDOG_USEC, watchdog calls are cleaned up on shutdown, and install/uninstall now consistently respect OPENCLAW_SYSTEMD_UNIT to avoid competing units. No definite functional regressions were found in the changed code paths.
- No files require special attention
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#13014: feat(infra): add systemd WatchdogSec integration
by TGambit65 · 2026-02-10
87.4%
#16016: fix: update systemd unit version on gateway restart
by jbold · 2026-02-14
80.9%
#20357: feat(gateway): make systemd KillMode configurable for gateway install
by Jackten · 2026-02-18
76.8%
#13084: fix(daemon): multi-layer defense against zombie gateway processes
by openperf · 2026-02-10
75.9%
#12234: gateway: incident tracking, recover command, and ciao ERR_SERVER_CL...
by levineam · 2026-02-09
75.4%
#16185: fix: patch systemd unit version before service restart
by nozh · 2026-02-14
75.2%
#18498: daemon: load systemd EnvironmentFile and drop-ins so gateway status...
by saurav470 · 2026-02-16
75.1%
#9036: fix: add systemd restart limits to prevent infinite crash-loops
by joetomasone · 2026-02-04
75.1%
#21212: fix: detect and manage systemd system services (rebased)
by growthringsadvisory · 2026-02-19
74.7%
#22154: dev(watch): make gateway watch portable on native Windows
by Kansodata · 2026-02-20
74.3%