#6273: fix: handle EPIPE errors gracefully in daemon operations
gateway
Cluster:
Error Handling and Memory Management
## Summary
Fixes #5345, #4632
When the gateway writes to stdout/stderr during process shutdown or restart, the pipe may already be closed, causing EPIPE errors that crash the process. This is especially problematic during:
1. **Agent-triggered restarts** (systemctl restart / launchctl kickstart) — user never receives a response
2. **LaunchAgent restarts on macOS** — causes exponential throttle, leaving the service down for hours
## Root Cause
All `stdout.write()` calls in the daemon code were unprotected. When the pipe is closed during shutdown:
- `restartSystemdService` crashes at line ~323 (issue #5345)
- `restartLaunchAgent` crashes similarly
- macOS launchd applies exponential backoff after crashes, causing 10+ hour outages (issue #4632)
## Solution
### 1. Safe Write Utilities (`src/daemon/safe-write.ts`)
```typescript
export function safeWrite(stream: NodeJS.WritableStream, data: string): boolean {
try {
stream.write(data);
return true;
} catch (err) {
if (isBrokenPipeError(err)) {
return false; // Suppress EPIPE/EIO
}
throw err;
}
}
```
### 2. Update All Daemon Files
- **systemd.ts**: Use `safeWriteLine` for all `stdout.write()` calls
- **launchd.ts**: Use `safeWriteLine` for all `stdout.write()` calls
### 3. Add ThrottleInterval to LaunchAgent
Added `<key>ThrottleInterval</key><integer>5</integer>` to the plist template. This caps restart delays at 5 seconds, preventing hours of downtime after crashes.
## Changes
| File | Changes |
|------|---------|
| `src/daemon/safe-write.ts` | New utility with `safeWrite`/`safeWriteLine` |
| `src/daemon/safe-write.test.ts` | Tests for EPIPE/EIO handling |
| `src/daemon/systemd.ts` | Use safe writes (8 call sites) |
| `src/daemon/launchd.ts` | Use safe writes (10 call sites) |
| `src/daemon/launchd-plist.ts` | Add ThrottleInterval: 5 |
## Testing
- Added unit tests for safe-write utilities
- Tests verify EPIPE/EIO are caught and suppressed
- Tests verify other errors still propagate
## Backward Compatibility
- No breaking changes
- Existing LaunchAgents will need reinstall to get ThrottleInterval
- Users can run `openclaw doctor --fix` to reinstall
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR introduces `safeWrite`/`safeWriteLine` utilities and replaces direct `stdout.write()` calls in the daemon’s systemd + launchd management code to avoid crashing when stdout/stderr pipes are closed during shutdown/restart. It also updates the generated macOS LaunchAgent plist template to include `ThrottleInterval=5` to prevent long launchd backoff windows.
The change fits into the daemon layer (`src/daemon/*`) by hardening CLI/daemon status output paths during lifecycle operations (install/stop/restart/uninstall) and adjusting launchd behavior via the plist template.
<h3>Confidence Score: 2/5</h3>
- This PR is directionally correct but may not actually prevent EPIPE crashes in real Node.js stream behavior.
- The main risk is that `try/catch` around `stream.write()` often won’t catch broken-pipe failures because they’re commonly delivered asynchronously via the stream `'error'` event; if so, the core bug remains. The launchd `ThrottleInterval` addition is also a behavior change that should be verified against supported macOS versions/semantics.
- src/daemon/safe-write.ts, src/daemon/launchd-plist.ts
<!-- greptile_other_comments_section -->
<sub>(4/5) You can add custom instructions or style guidelines for the agent [here](https://app.greptile.com/review/github)!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#9214: Fix: EPIPE exception in systemd service operations
by vishaltandale00 · 2026-02-05
85.3%
#13084: fix(daemon): multi-layer defense against zombie gateway processes
by openperf · 2026-02-10
80.8%
#12804: fix(daemon): use wrapper script for pnpm global installs in service...
by odinho · 2026-02-09
77.6%
#6577: fix: add null checks for stdout/stderr when using inherit-stdio fal...
by ncmalan · 2026-02-01
77.3%
#22224: fix(launchd/macos): prevent restart loop by using KeepAlive.Success...
by ashiabbott · 2026-02-20
76.9%
#8038: fix(exec): use spawnWithFallback to handle EBADF on macOS
by FelixFoster · 2026-02-03
76.7%
#20390: fix(daemon): fall back to /tmp for launchd logs on removable volumes
by lemoz · 2026-02-18
76.4%
#16845: fix(daemon): gateway auto-restart on SIGTERM + agent restart guidel...
by kiminbean · 2026-02-15
76.3%
#20272: fix: LaunchAgent KeepAlive causes restart loop (fixes #20257)
by MisterGuy420 · 2026-02-18
76.0%
#19857: fix(launchd): self-heal restart when service is unloaded
by vibecodooor · 2026-02-18
75.8%