#20555: fix(gateway): detect launchd supervision via XPC_SERVICE_NAME
size: XS
Cluster:
Gateway Restart Improvements
## Summary
- Problem: On macOS, when the gateway receives SIGUSR1 (config reload, update), it spawns a detached child process instead of letting launchd handle the restart. This creates a duplicate gateway that fights with launchd's `KeepAlive` respawn, producing thousands of lock-timeout errors every ~10 seconds.
- Why it matters: The error log fills with 8000+ failures ("gateway already running; lock timeout after 5000ms"), launchd shows 1198 runs, and the gateway restart is unreliable under launchd supervision.
- What changed: Added `XPC_SERVICE_NAME`, `OPENCLAW_LAUNCHD_LABEL`, and `OPENCLAW_SYSTEMD_UNIT` to the `SUPERVISOR_HINT_ENV_VARS` list in `process-respawn.ts`. On macOS, launchd sets `XPC_SERVICE_NAME` on every managed process (confirmed via `launchctl print`) but does **not** set `LAUNCH_JOB_LABEL` or `LAUNCH_JOB_NAME`. Without this check, `isLikelySupervisedProcess()` returns `false`, and the gateway forks a detached child via `spawn()` instead of returning `"supervised"`. Additionally, `OPENCLAW_LAUNCHD_LABEL` and `OPENCLAW_SYSTEMD_UNIT` are OpenClaw-propagated env vars from daemon/service env flows (`src/daemon/service-env.ts`, `src/daemon/node-service.ts`, restart helpers), serving as reliable fallback signals when platform-native vars are absent in some launch paths.
- What did NOT change (scope boundary): No changes to the lock mechanism, launchd plist generation, or restart-helper scripts.
## Change Type (select all)
- [x] Bug fix
- [ ] Feature
- [ ] Refactor
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [x] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [ ] Memory / storage
- [ ] Integrations
- [ ] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Related: similar launchd environment issue as #20512
## User-visible / Behavior Changes
- Gateway SIGUSR1 restarts under launchd no longer spawn a duplicate detached process. launchd cleanly restarts the single managed process.
- Eliminates the "gateway already running; lock timeout" error storm in logs.
## Security Impact (required)
- New permissions/capabilities? `No`
- Secrets/tokens handling changed? `No`
- New/changed network calls? `No`
- Command/tool execution surface changed? `No`
- Data access scope changed? `No`
## Repro + Verification
### Environment
- OS: macOS 15 (Darwin 25.2.0)
- Runtime/container: Node 25.6.1 via Homebrew
- Model/provider: N/A
- Integration/channel (if any): N/A
- Relevant config (redacted): gateway managed via `ai.openclaw.gateway` LaunchAgent with `KeepAlive=true`
### Steps
1. Install gateway as LaunchAgent (`openclaw gateway install`)
2. Trigger a config change or update that sends SIGUSR1 to the gateway
3. Observe gateway error log (`~/.openclaw/logs/gateway.err.log`)
### Expected
- Gateway restarts cleanly via launchd; no lock errors.
### Actual
- Gateway spawns a detached child (holds lock + port), then launchd also restarts the managed process → lock timeout → exit(1) → launchd restarts again → infinite loop every ~10s.
Observed error log:
```
Gateway failed to start: gateway already running (pid 85206); lock timeout after 5000ms
Port 18789 is already in use.
- pid 85206 dmitry: openclaw-gateway (127.0.0.1:18789)
Gateway service appears loaded. Stop it first.
```
## Evidence
- [x] Failing test/log before + passing after
- [x] Trace/log snippets
`launchctl print` confirms `XPC_SERVICE_NAME` is set but `LAUNCH_JOB_LABEL` is not:
```
environment = {
XPC_SERVICE_NAME => ai.openclaw.gateway
}
```
Gateway stdout log shows the spawn-based restart path was taken:
```
[gateway] restart mode: full process restart (spawned pid 67308)
```
After this fix, the restart returns `mode: "supervised"` and launchd handles it.
## Human Verification (required)
- Verified scenarios:
- All 7 process-respawn tests pass
- All 15 restart-helper tests pass
- Confirmed via `launchctl print` that `XPC_SERVICE_NAME` is set on the live launchd-managed gateway
- Edge cases checked:
- `clearSupervisorHints()` test helper updated to also clear `XPC_SERVICE_NAME`, `OPENCLAW_LAUNCHD_LABEL`, and `OPENCLAW_SYSTEMD_UNIT`
- Existing `LAUNCH_JOB_LABEL` detection path unchanged
- What you did **not** verify:
- End-to-end SIGUSR1 restart under launchd with the fix deployed (manual verification TODO)
## Compatibility / Migration
- Backward compatible? `Yes`
- Config/env changes? `No`
- Migration needed? `No`
## Failure Recovery (if this breaks)
- How to disable/revert this change quickly: Revert the two commits on this branch
- Files/config to restore: `src/infra/process-respawn.ts`
- Known bad symptoms reviewers should watch for: If a non-launchd macOS process happens to have `XPC_SERVICE_NAME` or `OPENCLAW_LAUNCHD_LABEL` set, it would incorrectly return `"supervised"` instead of spawning a child. In practice this is extremely unlikely outside of supervised contexts.
## Risks and Mitigations
- Risk: `XPC_SERVICE_NAME` could theoretically be set in non-launchd contexts (e.g. XPC services embedded in apps).
- Mitigation: The variable name is specific to Apple's XPC/launchd infrastructure. Any process with it set is effectively supervised. The existing `LAUNCH_JOB_LABEL` check has the same theoretical concern.
- Risk: `OPENCLAW_LAUNCHD_LABEL` / `OPENCLAW_SYSTEMD_UNIT` could be set manually by a user outside of a supervised context.
- Mitigation: These are internal OpenClaw env vars only propagated by daemon/service env flows. A user would have to explicitly set them, which would be an intentional override.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Most Similar PRs
#19573: fix(infra): detect OpenClaw service env vars in supervisor hint check
by heyhudson · 2026-02-17
78.4%
#18438: macOS: add in-app CLI + gateway install with reset support
by rimusz · 2026-02-16
74.5%
#12234: gateway: incident tracking, recover command, and ciao ERR_SERVER_CL...
by levineam · 2026-02-09
74.3%
#18236: macOS daemon: bootstrap LaunchAgent on gateway start after stop
by agisilaos · 2026-02-16
73.9%
#22304: Gateway: fix launchd start after stop
by apethree · 2026-02-21
73.8%
#20272: fix: LaunchAgent KeepAlive causes restart loop (fixes #20257)
by MisterGuy420 · 2026-02-18
73.8%
#17835: Fix misleading gateway stop hints for standalone listeners
by ConnorCallison · 2026-02-16
73.8%
#8260: fix(macOS): gateway readiness detection + reversible Configure later
by xksteven · 2026-02-03
73.2%
#16845: fix(daemon): gateway auto-restart on SIGTERM + agent restart guidel...
by kiminbean · 2026-02-15
72.6%
#16170: fix: restart service manager after update.run
by Swader · 2026-02-14
71.7%