#16170: fix: restart service manager after update.run
gateway
stale
size: M
Cluster:
Gateway Restart Improvements
## Summary
- Update `update.run` to prefer a service-manager restart (`launchctl` on macOS, `systemd` on Linux, `supervisor` otherwise) after update completion.
- Keep the existing SIGUSR1 path as fallback only when service-manager restart fails.
- Add a short grace-period fallback instead of forcing immediate `delayMs: 0`, reducing overlapping restart attempts during transient service-manager latency.
- Keep restart sentinel payload/reporting behavior unchanged.
## Why
- In managed installs, SIGUSR1-only recycle can leave stale runtime/modules loaded right after update.
- Service-manager restart aligns `update.run` with explicit gateway restart behavior and improves update apply reliability.
## AI and Testing
- AI-assisted: yes (Codex).
- Testing degree: fully tested.
- I understand and verified the changed restart flow and fallback behavior.
### Commands run
- `pnpm build`
- `pnpm check`
- `pnpm test`
- `pnpm test:e2e src/gateway/server.roles-allowlist-update.e2e.test.ts -t "falls back to SIGUSR1 when service restart fails"`
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR changes the `update.run` gateway handler to prefer service-manager restarts (`launchctl`/`systemd`/`supervisor`) over in-process SIGUSR1 restarts after an update completes. A new `scheduleGatewayServiceRestart` function wraps `triggerOpenClawRestart` in a `setTimeout`, falling back to `scheduleGatewaySigusr1Restart` (with a default 2000ms grace period) if the service-manager restart fails.
- `src/gateway/server-methods/update.ts`: Introduces `scheduleGatewayServiceRestart` that attempts a service-manager restart first, with SIGUSR1 as a fallback. Replaces the direct `scheduleGatewaySigusr1Restart` call in the handler.
- `src/gateway/server.roles-allowlist-update.e2e.test.ts`: Mocks `triggerOpenClawRestart` and `scheduleGatewaySigusr1Restart` to verify the happy path (service restart succeeds, SIGUSR1 not called) and fallback path (service restart fails, SIGUSR1 invoked with correct reason). Removes direct SIGUSR1 signal handling from tests.
- `.github/workflows/formal-conformance.yml`: Wraps the informational PR comment in a try/catch to tolerate 403 errors on fork PRs where the token lacks write permissions.
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge with low risk — the restart flow change is well-guarded by a fallback path and tested end-to-end.
- The core logic change (service-manager restart with SIGUSR1 fallback) follows existing patterns in `restart.ts`, has proper clamping/validation, and is covered by new e2e tests for both the happy path and fallback path. The CI change is a straightforward defensive improvement. No issues requiring inline comments were identified beyond the race condition already discussed in a previous thread.
- No files require special attention.
<sub>Last reviewed commit: 1b54efb</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#20355: fix(gateway): enforce commands.restart guard for config.apply and c...
by Clawborn · 2026-02-18
80.0%
#5077: fix(windows): implement reliable gateway restart via schtasks helper
by romeoscript · 2026-01-31
79.2%
#12953: fix: defer gateway restart until all replies are sent
by zoskebutler · 2026-02-10
79.1%
#13408: fix(gateway): skip SIGUSR1 restart in config.patch for noop reload ...
by rwmjhb · 2026-02-10
79.1%
#7128: feat: add gateway.restart RPC for graceful in-process restart
by AkashaBot · 2026-02-02
78.8%
#21591: fix(update): prevent double restart when refreshing service env
by irchelper · 2026-02-20
78.7%
#9112: Fix: Prevent double SIGUSR1 restart on model switch
by vishaltandale00 · 2026-02-04
78.3%
#16845: fix(daemon): gateway auto-restart on SIGTERM + agent restart guidel...
by kiminbean · 2026-02-15
77.4%
#13084: fix(daemon): multi-layer defense against zombie gateway processes
by openperf · 2026-02-10
76.6%
#18254: add /update chat command for Telegram git updates
by dangmstaredu · 2026-02-16
76.4%