← Back to PRs

#6273: fix: handle EPIPE errors gracefully in daemon operations

by batumilove open 2026-02-01 13:37 View on GitHub →
gateway
## Summary Fixes #5345, #4632 When the gateway writes to stdout/stderr during process shutdown or restart, the pipe may already be closed, causing EPIPE errors that crash the process. This is especially problematic during: 1. **Agent-triggered restarts** (systemctl restart / launchctl kickstart) — user never receives a response 2. **LaunchAgent restarts on macOS** — causes exponential throttle, leaving the service down for hours ## Root Cause All `stdout.write()` calls in the daemon code were unprotected. When the pipe is closed during shutdown: - `restartSystemdService` crashes at line ~323 (issue #5345) - `restartLaunchAgent` crashes similarly - macOS launchd applies exponential backoff after crashes, causing 10+ hour outages (issue #4632) ## Solution ### 1. Safe Write Utilities (`src/daemon/safe-write.ts`) ```typescript export function safeWrite(stream: NodeJS.WritableStream, data: string): boolean { try { stream.write(data); return true; } catch (err) { if (isBrokenPipeError(err)) { return false; // Suppress EPIPE/EIO } throw err; } } ``` ### 2. Update All Daemon Files - **systemd.ts**: Use `safeWriteLine` for all `stdout.write()` calls - **launchd.ts**: Use `safeWriteLine` for all `stdout.write()` calls ### 3. Add ThrottleInterval to LaunchAgent Added `<key>ThrottleInterval</key><integer>5</integer>` to the plist template. This caps restart delays at 5 seconds, preventing hours of downtime after crashes. ## Changes | File | Changes | |------|---------| | `src/daemon/safe-write.ts` | New utility with `safeWrite`/`safeWriteLine` | | `src/daemon/safe-write.test.ts` | Tests for EPIPE/EIO handling | | `src/daemon/systemd.ts` | Use safe writes (8 call sites) | | `src/daemon/launchd.ts` | Use safe writes (10 call sites) | | `src/daemon/launchd-plist.ts` | Add ThrottleInterval: 5 | ## Testing - Added unit tests for safe-write utilities - Tests verify EPIPE/EIO are caught and suppressed - Tests verify other errors still propagate ## Backward Compatibility - No breaking changes - Existing LaunchAgents will need reinstall to get ThrottleInterval - Users can run `openclaw doctor --fix` to reinstall <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR introduces `safeWrite`/`safeWriteLine` utilities and replaces direct `stdout.write()` calls in the daemon’s systemd + launchd management code to avoid crashing when stdout/stderr pipes are closed during shutdown/restart. It also updates the generated macOS LaunchAgent plist template to include `ThrottleInterval=5` to prevent long launchd backoff windows. The change fits into the daemon layer (`src/daemon/*`) by hardening CLI/daemon status output paths during lifecycle operations (install/stop/restart/uninstall) and adjusting launchd behavior via the plist template. <h3>Confidence Score: 2/5</h3> - This PR is directionally correct but may not actually prevent EPIPE crashes in real Node.js stream behavior. - The main risk is that `try/catch` around `stream.write()` often won’t catch broken-pipe failures because they’re commonly delivered asynchronously via the stream `'error'` event; if so, the core bug remains. The launchd `ThrottleInterval` addition is also a behavior change that should be verified against supported macOS versions/semantics. - src/daemon/safe-write.ts, src/daemon/launchd-plist.ts <!-- greptile_other_comments_section --> <sub>(4/5) You can add custom instructions or style guidelines for the agent [here](https://app.greptile.com/review/github)!</sub> <!-- /greptile_comment -->

Most Similar PRs