#19857: fix(launchd): self-heal restart when service is unloaded
gateway
cli
size: S
Cluster:
Gateway Restart Improvements
## Summary
This hardens macOS service restarts for the post-update edge case where `launchctl kickstart -k` fails with "Could not find service ..." because the LaunchAgent is no longer loaded.
Changes:
- `src/cli/update-cli/restart-helper.ts`
- update restart script now retries via `bootstrap + enable + kickstart` when plain `kickstart` fails and the plist exists.
- `src/daemon/launchd.ts`
- `restartLaunchAgent()` now attempts `repairLaunchAgentBootstrap()` when `kickstart` fails with a "service not loaded" signature.
## Why
During update/restart flows, users can end up with a valid plist on disk but an unloaded launchd job. In that state, plain `kickstart` fails and recovery currently requires manual operator intervention (`launchctl bootstrap ...`).
This PR makes restart self-healing for that specific failure mode while keeping happy-path behavior unchanged.
## Tests
- `src/cli/update-cli/restart-helper.test.ts`
- assert launchd restart script includes bootstrap/enable fallback
- assert custom launchd label path is reflected in plist fallback
- `src/daemon/launchd.test.ts`
- add restart fallback test for service-not-loaded kickstart failure
- add failure-path test when bootstrap fallback itself fails
Ran:
- `pnpm vitest src/cli/update-cli/restart-helper.test.ts src/daemon/launchd.test.ts`
## Related context
- Open PR: #11327 (`fix(launchd): reload plist from disk on restartLaunchAgent`)
- Closed PR: #14178 (similar bootstrap fallback idea in infra path)
This PR is intentionally narrower: it targets the specific "service not loaded" failure path and update restart-script recovery.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR adds self-healing logic for macOS `launchctl` restarts when the LaunchAgent service is unloaded. It modifies two paths: the standalone shell restart script (`restart-helper.ts`) and the in-process daemon restart (`launchd.ts`). When a `kickstart` fails with a "service not loaded" error, both paths now fall back to `bootstrap` + `kickstart` to re-register and start the service.
- The shell script fallback in `restart-helper.ts` correctly follows the `bootstrap` → `enable` → `kickstart` sequence, matching the pattern used in `installLaunchAgent`.
- The daemon fallback in `restartLaunchAgent` delegates to the existing `repairLaunchAgentBootstrap`, which is missing an `enable` step between `bootstrap` and `kickstart`. This creates an inconsistency with both the shell script and the install flow, and may cause the repair to fail for services in a persisted-disabled state.
- Tests cover the happy path (kickstart fails → bootstrap succeeds) and the failure path (both kickstart and bootstrap fail), but don't test the persisted-disabled scenario.
<h3>Confidence Score: 3/5</h3>
- Mostly safe — the happy path is unchanged and the fallback logic is sound, but the missing `enable` step in the daemon repair path could cause the fallback to fail for previously-disabled services.
- The core approach is correct and well-tested for the basic case. However, the inconsistency between the shell script (which includes `enable`) and the daemon's `repairLaunchAgentBootstrap` (which omits it) means the daemon-side repair may not fully self-heal in all cases, specifically when launchd has persisted a "disabled" state for the service. The install flow has an explicit comment explaining why `enable` is needed, so this omission appears unintentional.
- `src/daemon/launchd.ts` — the `repairLaunchAgentBootstrap` function is missing an `enable` step that both the shell script and install flow include.
<sub>Last reviewed commit: 48b05f1</sub>
<!-- greptile_other_comments_section -->
<sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#11327: fix(launchd): reload plist from disk on restartLaunchAgent
by caiop91 · 2026-02-07
82.7%
#21591: fix(update): prevent double restart when refreshing service env
by irchelper · 2026-02-20
81.8%
#15619: fix: clean up orphan LaunchAgent plist on bootstrap failure
by superlowburn · 2026-02-13
79.1%
#16845: fix(daemon): gateway auto-restart on SIGTERM + agent restart guidel...
by kiminbean · 2026-02-15
77.9%
#20272: fix: LaunchAgent KeepAlive causes restart loop (fixes #20257)
by MisterGuy420 · 2026-02-18
76.9%
#22224: fix(launchd/macos): prevent restart loop by using KeepAlive.Success...
by ashiabbott · 2026-02-20
76.9%
#6273: fix: handle EPIPE errors gracefully in daemon operations
by batumilove · 2026-02-01
75.8%
#18236: macOS daemon: bootstrap LaunchAgent on gateway start after stop
by agisilaos · 2026-02-16
75.3%
#22304: Gateway: fix launchd start after stop
by apethree · 2026-02-21
75.2%
#13084: fix(daemon): multi-layer defense against zombie gateway processes
by openperf · 2026-02-10
74.7%