← Back to PRs

#19391: fix(process): destroy stdio streams on dispose and terminate children on shutdown

by nabbilkhan open 2026-02-17 19:02 View on GitHub →
gateway size: M
## Summary - **Problem**: Two resource-leak paths in the process supervisor: (1) `ChildProcessAdapter.dispose()` never destroys stdio streams, leaking FDs until `EMFILE` crash on long-running gateways; (2) no mechanism to terminate active child processes on shutdown, leaving orphaned processes holding ports and temp files. Noticed the FD count climbing steadily on a long-running gateway instance — eventually hit EMFILE after a few days of continuous operation. - **Root cause**: `dispose()` removes event listeners but never calls `.destroy()` on stdin/stdout/stderr streams. `server-close.ts` has no hook into the process supervisor to cancel active runs. - **Solution**: (1) Explicitly destroy stdout, stderr, and stdin in `dispose()` with `!stream.destroyed` guards and try/catch. (2) Add `cancelAll(reason?)` to `ProcessSupervisor` and call it from `server-close.ts` during graceful shutdown. - **Scope boundary**: `killProcessTree` internals unchanged. Fire-and-forget shutdown — does not delay Node.js exit. ## Change Type - [x] Bug fix ## Scope - [x] Gateway / orchestration - [x] Skills / tool execution ## Linked Issue Fixes #9068 Fixes #18420 Related: #18833 ## User-visible Changes - Long-running gateways no longer accumulate leaked file descriptors from disposed child processes. - Gateway shutdown (SIGINT/SIGTERM) now terminates all active child processes instead of leaving them as orphans. ## Security Impact - New permissions/capabilities? `No` - Secrets/tokens handling changed? `No` - New/changed network calls? `No` - Command/tool execution surface changed? `No` - Data access scope changed? `No` ## Repro + Verification ### Environment - OS: Ubuntu 24.04 (Linux 6.8.0) - Runtime: Node 22.x + pnpm - OpenClaw: v2026.2.x ### Steps (FD leak) 1. Run a gateway with agents that spawn many child processes (exec tool) 2. Monitor open FDs: `ls /proc/<pid>/fd | wc -l` 3. After many tool calls, FD count grows unbounded ### Steps (orphan processes) 1. Start gateway, trigger child process via exec tool 2. Send SIGTERM to gateway 3. Observe child process still running: `ps aux | grep <child>` ### Expected - FD count stays bounded after child process disposal - All child processes terminate on gateway shutdown ### Actual (before fix) - FDs leak indefinitely → eventual `EMFILE` crash - Child processes survive gateway shutdown as orphans ## Evidence - [x] Failing test/log before + passing after ``` ✓ src/process/child.test.ts ✓ destroys all three stdio streams on dispose ✓ dispose is safe when streams are already destroyed ✓ dispose is safe after child process has exited ✓ src/process/supervisor.test.ts ✓ cancelAll terminates all active runs ✓ cancelAll on empty supervisor is a no-op ✓ cancelAll is resilient to partial cancellation failures ✓ full shutdown verifies zero orphan PIDs remain after cancelAll ``` ``` pnpm build # ✅ zero errors pnpm check # ✅ zero warnings pnpm vitest run --config vitest.unit.config.ts src/process/ # ✅ all pass ``` ## Compatibility - Backward compatible: `Yes` — `cancelAll` is additive to the interface; existing `cancel()` behavior unchanged - Config changes: `None` - Migration: `None` ## Failure Recovery - Revert: `git revert <sha>` — single commit - Symptoms to watch: child processes not being cleaned up (would indicate revert needed), or shutdown hanging (would indicate `cancelAll` blocking — mitigated by fire-and-forget design) ## Risks - Risk: `cancelAll` during shutdown could race with in-flight tool calls. - Mitigation: Uses existing `cancel()` per-run (which already handles races). Individual cancellation errors are caught and logged — one failure doesn't block the rest. Shutdown is fire-and-forget with `.unref()`'d escalation timers. AI-assisted: Yes (Claude Code). Fully tested — all tests pass.

Most Similar PRs