#16170: fix: restart service manager after update.run

by Swader open 2026-02-14 11:40 View on GitHub →

gateway stale size: M

## Summary - Update `update.run` to prefer a service-manager restart (`launchctl` on macOS, `systemd` on Linux, `supervisor` otherwise) after update completion. - Keep the existing SIGUSR1 path as fallback only when service-manager restart fails. - Add a short grace-period fallback instead of forcing immediate `delayMs: 0`, reducing overlapping restart attempts during transient service-manager latency. - Keep restart sentinel payload/reporting behavior unchanged. ## Why - In managed installs, SIGUSR1-only recycle can leave stale runtime/modules loaded right after update. - Service-manager restart aligns `update.run` with explicit gateway restart behavior and improves update apply reliability. ## AI and Testing - AI-assisted: yes (Codex). - Testing degree: fully tested. - I understand and verified the changed restart flow and fallback behavior. ### Commands run - `pnpm build` - `pnpm check` - `pnpm test` - `pnpm test:e2e src/gateway/server.roles-allowlist-update.e2e.test.ts -t "falls back to SIGUSR1 when service restart fails"`  <h3>Greptile Summary</h3> This PR changes the `update.run` gateway handler to prefer service-manager restarts (`launchctl`/`systemd`/`supervisor`) over in-process SIGUSR1 restarts after an update completes. A new `scheduleGatewayServiceRestart` function wraps `triggerOpenClawRestart` in a `setTimeout`, falling back to `scheduleGatewaySigusr1Restart` (with a default 2000ms grace period) if the service-manager restart fails. - `src/gateway/server-methods/update.ts`: Introduces `scheduleGatewayServiceRestart` that attempts a service-manager restart first, with SIGUSR1 as a fallback. Replaces the direct `scheduleGatewaySigusr1Restart` call in the handler. - `src/gateway/server.roles-allowlist-update.e2e.test.ts`: Mocks `triggerOpenClawRestart` and `scheduleGatewaySigusr1Restart` to verify the happy path (service restart succeeds, SIGUSR1 not called) and fallback path (service restart fails, SIGUSR1 invoked with correct reason). Removes direct SIGUSR1 signal handling from tests. - `.github/workflows/formal-conformance.yml`: Wraps the informational PR comment in a try/catch to tolerate 403 errors on fork PRs where the token lacks write permissions. <h3>Confidence Score: 4/5</h3> - This PR is safe to merge with low risk — the restart flow change is well-guarded by a fallback path and tested end-to-end. - The core logic change (service-manager restart with SIGUSR1 fallback) follows existing patterns in `restart.ts`, has proper clamping/validation, and is covered by new e2e tests for both the happy path and fallback path. The CI change is a straightforward defensive improvement. No issues requiring inline comments were identified beyond the race condition already discussed in a previous thread. - No files require special attention. <sub>Last reviewed commit: 1b54efb</sub>