#21054: fix(cli): fix memory search hang — close undici pool + destroy QMD stdio on timeout

by BinHPdev open 2026-02-19 16:05 View on GitHub →

cli size: M

Cluster: Error Handling and Memory Management

## Problem \`openclaw memory search\` (and any CLI command that calls a remote embedding provider) hangs indefinitely after completing — the process never exits without \`Ctrl-C\`. Two independent Node.js event loop leaks keep the process alive. --- ## Root Cause Analysis ### Leak 1 — undici HTTP connection pool Remote embedding calls (OpenAI, Gemini, Voyage, etc.) go through undici's global dispatcher, which maintains keep-alive TLS connections. After a response is received, those Sockets stay open waiting for connection reuse. Nothing in the CLI shutdown path closed them, so the event loop could never drain. ### Leak 2 — QMD child stdio pipes (the subtle one) When a \`qmd\` subprocess hangs and is killed with \`SIGKILL\` on timeout, the naive fix is to call \`child.unref()\`. **This is insufficient.** Here is why: \`unref()\` only removes the \`ChildProcess\` object from the event loop's ref count. But a spawned child has three *independent* event loop handles — the \`stdout\`, \`stderr\`, and \`stdin\` Socket objects (the parent-side ends of the stdio pipes). These are not unref'd by \`child.unref()\`. The hang scenario: ``` Node.js (parent) qmd (child, SIGKILL'd) grandchild (still alive) stdout pipe read-end ←── stdout pipe write-end ←── inherited write-end (open) ``` \`qmd\` is an ML tool that can spawn inference worker subprocesses. \`SIGKILL\` only kills the direct child — not its descendants. If a grandchild inherited the stdio file descriptors and holds the write-end open, Node.js never receives EOF on the read-end. The Socket stays ref'd, and the event loop hangs regardless of \`unref()\`. \`child.unref()\` accidentally "worked" in the common case where \`qmd\` had no grandchildren or all descendants were killed together — but failed silently in the grandchild scenario. ### Coverage gap — \`tryRouteCli\` early-return path The initial fix wrapped only \`program.parseAsync\` in a \`try/finally\`. However, \`models status --probe\` is handled inside \`tryRouteCli\` (which has an early \`return\`) and makes HTTP requests through undici. That path bypassed the dispatcher cleanup entirely. --- ## Fix ### Leak 1 \`closeGlobalFetchDispatcher()\` drains the undici connection pool. Wrapped in a single \`try/finally\` that covers **both** \`tryRouteCli\` and \`program.parseAsync\`, so no CLI exit path is missed. Uses \`dynamic import\` + \`try/catch\` so it is safe when undici is absent. ### Leak 2 Replace \`child.unref()\` with explicit pipe destruction in the timeout handler: ```ts child.stdout?.destroy(); child.stderr?.destroy(); child.stdin?.destroy(); ``` \`destroy()\` closes the parent-side file descriptors immediately and unconditionally — it does not matter whether the child, its grandchildren, or any other process still holds the write-end open. Once the parent closes its end, those Socket handles are removed from the event loop. --- ## What Changed | File | Change | |---|---| | \`src/cli/run-main.ts\` | Single \`try/finally\` covering both \`tryRouteCli\` + \`parseAsync\`; \`closeGlobalFetchDispatcher()\` exported | | \`src/memory/qmd-manager.ts\` | \`child.unref()\` → \`destroy()\` on all three stdio streams in the timeout handler | | \`src/cli/run-main-dispatcher.test.ts\` | New file — isolated undici mock + 2 tests for \`closeGlobalFetchDispatcher\` | | \`src/cli/run-main.test.ts\` | Restored to pure argv-parsing tests; undici mock extracted to the new file above | | \`src/memory/qmd-manager.test.ts\` | \`MockStream\` type with per-stream \`destroyCalled\` tracking replaces the previous \`unref\`-based mock infrastructure; 1 new test verifying all three streams are destroyed on timeout | **Production code delta: 4 lines changed across 2 files.** No behavioral changes outside the event-loop cleanup paths. --- ## Change Type - [x] Bug fix ## Scope - [x] Memory / storage - [x] UI / DX ## Linked Issue - Closes #21018 ## User-visible Change \`openclaw memory search\` (and any CLI command using a remote embedding provider) now exits cleanly after completing, without requiring \`Ctrl-C\`. ## Security Impact - New permissions/capabilities? No - Secrets/tokens handling changed? No - New/changed network calls? No - Command/tool execution surface changed? No - Data access scope changed? No ## Verification - \`pnpm build\` — clean (TypeScript + lint + format) - 54 tests pass across all affected files - All CI checks green - Backward compatible, no config/env changes