#19391: fix(process): destroy stdio streams on dispose and terminate children on shutdown
gateway
size: M
Cluster:
Error Handling and Memory Management
## Summary
- **Problem**: Two resource-leak paths in the process supervisor: (1) `ChildProcessAdapter.dispose()` never destroys stdio streams, leaking FDs until `EMFILE` crash on long-running gateways; (2) no mechanism to terminate active child processes on shutdown, leaving orphaned processes holding ports and temp files. Noticed the FD count climbing steadily on a long-running gateway instance — eventually hit EMFILE after a few days of continuous operation.
- **Root cause**: `dispose()` removes event listeners but never calls `.destroy()` on stdin/stdout/stderr streams. `server-close.ts` has no hook into the process supervisor to cancel active runs.
- **Solution**: (1) Explicitly destroy stdout, stderr, and stdin in `dispose()` with `!stream.destroyed` guards and try/catch. (2) Add `cancelAll(reason?)` to `ProcessSupervisor` and call it from `server-close.ts` during graceful shutdown.
- **Scope boundary**: `killProcessTree` internals unchanged. Fire-and-forget shutdown — does not delay Node.js exit.
## Change Type
- [x] Bug fix
## Scope
- [x] Gateway / orchestration
- [x] Skills / tool execution
## Linked Issue
Fixes #9068
Fixes #18420
Related: #18833
## User-visible Changes
- Long-running gateways no longer accumulate leaked file descriptors from disposed child processes.
- Gateway shutdown (SIGINT/SIGTERM) now terminates all active child processes instead of leaving them as orphans.
## Security Impact
- New permissions/capabilities? `No`
- Secrets/tokens handling changed? `No`
- New/changed network calls? `No`
- Command/tool execution surface changed? `No`
- Data access scope changed? `No`
## Repro + Verification
### Environment
- OS: Ubuntu 24.04 (Linux 6.8.0)
- Runtime: Node 22.x + pnpm
- OpenClaw: v2026.2.x
### Steps (FD leak)
1. Run a gateway with agents that spawn many child processes (exec tool)
2. Monitor open FDs: `ls /proc/<pid>/fd | wc -l`
3. After many tool calls, FD count grows unbounded
### Steps (orphan processes)
1. Start gateway, trigger child process via exec tool
2. Send SIGTERM to gateway
3. Observe child process still running: `ps aux | grep <child>`
### Expected
- FD count stays bounded after child process disposal
- All child processes terminate on gateway shutdown
### Actual (before fix)
- FDs leak indefinitely → eventual `EMFILE` crash
- Child processes survive gateway shutdown as orphans
## Evidence
- [x] Failing test/log before + passing after
```
✓ src/process/child.test.ts
✓ destroys all three stdio streams on dispose
✓ dispose is safe when streams are already destroyed
✓ dispose is safe after child process has exited
✓ src/process/supervisor.test.ts
✓ cancelAll terminates all active runs
✓ cancelAll on empty supervisor is a no-op
✓ cancelAll is resilient to partial cancellation failures
✓ full shutdown verifies zero orphan PIDs remain after cancelAll
```
```
pnpm build # ✅ zero errors
pnpm check # ✅ zero warnings
pnpm vitest run --config vitest.unit.config.ts src/process/ # ✅ all pass
```
## Compatibility
- Backward compatible: `Yes` — `cancelAll` is additive to the interface; existing `cancel()` behavior unchanged
- Config changes: `None`
- Migration: `None`
## Failure Recovery
- Revert: `git revert <sha>` — single commit
- Symptoms to watch: child processes not being cleaned up (would indicate revert needed), or shutdown hanging (would indicate `cancelAll` blocking — mitigated by fire-and-forget design)
## Risks
- Risk: `cancelAll` during shutdown could race with in-flight tool calls.
- Mitigation: Uses existing `cancel()` per-run (which already handles races). Individual cancellation errors are caught and logged — one failure doesn't block the rest. Shutdown is fire-and-forget with `.unref()`'d escalation timers.
AI-assisted: Yes (Claude Code). Fully tested — all tests pass.
Most Similar PRs
#21054: fix(cli): fix memory search hang — close undici pool + destroy QMD ...
by BinHPdev · 2026-02-19
72.2%
#20555: fix(gateway): detect launchd supervision via XPC_SERVICE_NAME
by dimat · 2026-02-19
69.5%
#20629: fix: use KillMode=mixed to prevent orphaned child processes
by alexander-morris · 2026-02-19
68.1%
#19636: fix(agents): harden overflow recovery observability + subagent term...
by Jackten · 2026-02-18
67.9%
#3699: fix(gateway): add error handling for tailscaleCleanup in shutdown
by Episkey-G · 2026-01-29
67.8%
#22411: fix(cron): cancel timed-out runs before side effects
by Takhoffman · 2026-02-21
66.9%
#21944: feat(gateway): crash-loop protection with escalating backoff
by Protocol-zero-0 · 2026-02-20
66.6%
#7187: fix(gateway): suppress AbortError during shutdown
by hclsys · 2026-02-02
66.5%
#13284: fix(commands): /stop now kills orphaned exec processes
by thebtf · 2026-02-10
66.1%
#12234: gateway: incident tracking, recover command, and ciao ERR_SERVER_CL...
by levineam · 2026-02-09
66.1%