#17910: feat(memory): QMD daemon mode — persistent process with idle lifecycle

by patrickshao open 2026-02-16 09:08 View on GitHub →

docs agents size: L

Cluster: Error Handling and Memory Management

## Summary - Problem: spawn-per-query QMD repeatedly pays cold-start cost and can miss interactive latency targets. - Why it matters: users who opt into QMD want higher-quality retrieval without repeated startup penalties. - What changed: this PR adds **optional** warm QMD daemon mode using QMD's HTTP MCP transport (`qmd mcp --http --daemon`) with OpenClaw-owned lifecycle (lazy start, health check, idle stop, shutdown stop). - What changed: warm-path failures/timeouts automatically fall back to existing spawn-per-query behavior for the same query. - What changed: docs/config were updated for npm install (`@tobilu/qmd@1.0.6`) and daemon port config. - What did NOT change: default behavior is still non-daemon unless explicitly enabled. ## Change Type (select all) - [ ] Bug fix - [x] Feature - [ ] Refactor - [x] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [x] Skills / tool execution - [ ] Auth / tokens - [x] Memory / storage - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes # - Related #9048, #9581, #15579, #9605, #16047 ## Related Approaches | Approach | Strengths | Limitations | Why this PR exists | |---|---|---|---| | Spawn-per-query QMD baseline | Simple isolation per call | Repeated cold starts | Keep as fallback/default path | | Minimal MCP stdio wrapper | Smaller integration surface | Less lifecycle control/hardening | Add explicit lifecycle + fallback semantics | | Builtin backend | No QMD runtime dependency | Different retrieval profile than QMD | Keep available as separate backend | | This PR (optional warm daemon over HTTP MCP) | Warm reuse, idle lifecycle, fallback safety, explicit port control | Native model/runtime still may timeout/crash | Improve QMD reliability/latency while preserving safe fallback | Out of scope in this PR: - Retrieval ranking policy changes (for example source weighting) - Agent-side query-budget policy - Long-duration soak tuning beyond current timeout + fallback controls ## User-visible / Behavior Changes - New optional daemon mode under `memory.qmd.daemon.*` (still opt-in via `enabled`). - New config key: `memory.qmd.daemon.port` (default `18790`). - Warm daemon transport is HTTP MCP on loopback; runtime supports loopback endpoint compatibility (`127.0.0.1` and `::1`). - Existing spawn-per-query path remains intact and is used on warm-path failures/timeouts. - Docs now standardize QMD install as `npm i -g @tobilu/qmd@1.0.6`. ## Security Impact (required) - New permissions/capabilities? (`No`) - Secrets/tokens handling changed? (`No`) - New/changed network calls? (`Yes`) - Command/tool execution surface changed? (`Yes`) - Data access scope changed? (`No`) - If any `Yes`, explain risk + mitigation: - Network calls are loopback-only to local QMD daemon endpoint. - Daemon lifecycle commands are restricted to configured local `qmd` binary and existing memory scope. - On errors, manager falls back to existing spawn-per-query path instead of failing closed. ## Repro + Verification ### Environment - OS: macOS (Apple Silicon) - Runtime/container: Node 22.x, pnpm - Model/provider: QMD local model (`@tobilu/qmd`) - Integration/channel (if any): dev gateway profile - Relevant config (redacted): `memory.backend=qmd`, `memory.qmd.daemon.enabled=true` ### Steps 1. Install/verify QMD: `qmd --version` (validated on `1.0.6`). 2. Start dev gateway from branch and trigger `memory_search` calls. 3. Observe daemon lazy start + query behavior; verify fallback path remains functional on failures. 4. Verify `memory_get` compatibility with returned memory paths. ### Expected - Daemon starts lazily on first warm-path query and remains warm until idle timeout/shutdown. - Warm failures do not break memory search; query falls back to spawn-per-query. - `memory_get` remains compatible with returned paths. ### Actual - Verified locally with tests + dev gateway runs. - Confirmed daemon startup log in successful warm run and preserved fallback behavior when daemon ownership conflicted. ## Evidence Attach at least one: - [x] Failing test/log before + passing after - [x] Trace/log snippets - [ ] Screenshot/recording - [x] Perf numbers (if relevant) ### A) Cold warm-path (initial lazy load) Goal: show first-query daemon startup cost plus successful warm-path completion. Command used: ```bash node dist/entry.js --profile dev agent --to +15555550123 --message "What do you know about BillSplitPro from memory notes?" --json --timeout 120 > /tmp/qmd-postfix4.json && jq '{status,summary,duration_ms:.result.meta.durationMs}' /tmp/qmd-postfix4.json ``` Output: ```json { "status": "ok", "summary": "completed", "duration_ms": 28200 } ``` Log snippet: ```text 2026-02-17T11:50:10.035Z embedded run tool start ... tool=memory_search 2026-02-17T11:50:10.567Z qmd daemon started (http://[::1]:18790/mcp) 2026-02-17T11:50:23.055Z embedded run tool end ... tool=memory_search tool delta: ~13.0s ``` ### B) Hot warm-path (daemon already loaded) Goal: show query latency when daemon is already warm and no startup is required. Command used: ```bash node dist/entry.js --profile dev agent --to +15555550123 --message "Use memory_search exactly once for 'BillSplitPro receipt OCR', then reply with one sentence summary and citation." --json --timeout 120 > /tmp/qmd-postfix7.json && jq '{status,summary,duration_ms:.result.meta.durationMs}' /tmp/qmd-postfix7.json ``` Output: ```json { "status": "ok", "summary": "completed", "duration_ms": 7810 } ``` Log snippet: ```text 2026-02-17T12:19:26.219Z embedded run tool start ... tool=memory_search 2026-02-17T12:19:28.574Z embedded run tool end ... tool=memory_search (no qmd daemon started line between start/end) tool delta: ~2.36s ``` ### C) Fallback path (intentional resilience case) Goal: show query still succeeds when warm-path daemon ownership is conflicted. Injected condition: pre-existing daemon ownership conflict (stale process on daemon port). ```text 2026-02-17T11:48:20.787Z qmd daemon search failed, falling back to spawn-per-query: Already running (PID 13434). Run qmd mcp stop first. 2026-02-17T11:48:20.536Z embedded run tool start ... tool=memory_search 2026-02-17T11:48:46.643Z embedded run tool end ... tool=memory_search tool delta: ~26.1s ``` Interpretation: - Cold warm-path includes one-time startup overhead. - Hot warm-path is materially faster once daemon is already loaded. - Fallback is slower but preserves successful query completion under daemon failure/conflict. Note: full `duration_ms` includes model response generation + non-memory work. Tool-level timing above isolates memory tool latency. Timing decomposition (from log timestamps): | Case | Full run `duration_ms` | Pre-memory (run start → memory_search start) | Memory tool time (memory_search start → end) | Post-memory (memory_search end → prompt end) | |---|---:|---:|---:|---:| | Cold warm-path (lazy start) | 28.2s | ~2.6s | ~13.0s | ~12.6s | | Hot warm-path (already warm) | 7.8s | ~2.7s | ~2.36s | ~2.72s | | Fallback path | 37.9s | ~3.2s | ~26.1s | ~8.6s | Interpretation: this separates agent/model orchestration time from QMD memory-tool time; warm daemon benefit is visible in the memory-tool segment. ## Human Verification (required) What you personally verified (not just CI), and how: - Verified scenarios: - `pnpm check` on touched files - `pnpm vitest run src/memory/qmd-manager.test.ts` - `pnpm vitest run --config vitest.e2e.config.ts src/agents/tools/memory-tool.e2e.test.ts` - `pnpm build` - Dev gateway startup + memory query runs - Edge cases checked: - QMD 1.0.6 HTTP MCP header/session compatibility - Loopback endpoint compatibility (`127.0.0.1` vs `::1`) - Daemon conflict/fallback behavior - What you did **not** verify: - Multi-hour soak run ## Compatibility / Migration - Backward compatible? (`Yes`) - Config/env changes? (`Yes`) - Migration needed? (`No`) - If yes, exact upgrade steps: - Optional: set `memory.qmd.daemon.enabled=true` - Optional: set `memory.qmd.daemon.port` if `18790` conflicts - Ensure local QMD version supports HTTP daemon workflow (documented as `@tobilu/qmd@1.0.6`) ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: - Set `memory.qmd.daemon.enabled=false` (returns to spawn-per-query QMD) - Or set `memory.backend=builtin` - Files/config to restore: - `src/memory/qmd-daemon.ts` - `src/memory/qmd-manager.ts` - `src/memory/backend-config.ts` - `src/config/types.memory.ts` - `src/config/zod-schema.ts` - `src/config/schema.help.ts` - `src/config/schema.labels.ts` - Known bad symptoms reviewers should watch for: - Repeated daemon-start conflict warnings - Persistent warm timeouts (should still fall back and return results) ## Risks and Mitigations - Risk: HTTP daemon contract differences across QMD versions. - Mitigation: docs pin to npm `@tobilu/qmd@1.0.6`; fallback path remains active. - Risk: Stale external daemon process can conflict with owned daemon start. - Mitigation: explicit stop on idle/shutdown, configurable port, preserved fallback. - Risk: Local model/runtime instability. - Mitigation: health checks + timeout-based fallback to spawn-per-query. ## Notes - Implemented and validated with Patrick Shao using OpenClaw, Claude Code Opus 4.6, and Codex.