#4231: fix(memory): use sqlite-vec knn (MATCH+k) for vector search

by leonardsellem open 2026-01-29 22:37 View on GitHub →

Cluster: Memory Optimization and Gateway Enhancements

## Summary This speeds up `memory_search` vector retrieval when sqlite-vec is enabled by switching from a brute-force `ORDER BY vec_distance_cosine(...)` scan over the full vec0 table to sqlite-vec KNN preselection (`MATCH` + `k`). ## Root cause The sqlite-vec path was still effectively doing an O(N) scan + global sort: - compute `vec_distance_cosine(v.embedding, ?)` for every row - `ORDER BY dist ASC LIMIT ?` On larger stores this can be tens of seconds per query even on decent CPU/RAM. ## Fix - Use sqlite-vec KNN (`embedding MATCH ? AND k = ?`) to select a small candidate set. - Compute cosine distance only for the candidates, then sort/limit. JS fallback behavior is unchanged when sqlite-vec is unavailable. ## Repro environment (my setup) Configuration (redacted, but precise): - `memorySearch.enabled = true` - `memorySearch.provider = openai` - `memorySearch.model = text-embedding-3-small` (1536 dims) - `memorySearch.sources = ["memory", "sessions"]` (session memory enabled) - `memorySearch.extraPaths` indexes an additional large Markdown vault (path omitted) - Hybrid search enabled (BM25 + vector): `vectorWeight=0.7`, `textWeight=0.3`, `maxResults=15`, `minScore=0.4` (candidate multiplier left at default) - Embedding cache enabled (`maxEntries=100000`) - sqlite-vec enabled and loads successfully Scale: - Indexed Markdown files: 5,198 total - `memory`: 5,057 files / 165,798 chunks - `sessions`: 141 files / 934 chunks - Chunks: 166,732 - SQLite store size: ~8.0 GB Hardware: - CPU: AMD EPYC 4244P (6 cores / 12 threads) - RAM: 30 GiB - Storage: ~892 GB usable NVMe (RAID1) - GPU: none for compute (server has only a basic display controller) Software: - OS: Linux x86_64 - Node: v22.21.1 - sqlite-vec: 0.1.7-alpha.2 ## Performance notes - Direct SQL benchmark on the above store (top-20): ~75s (brute-force) → ~0.35s (KNN) - End-to-end CLI (includes embedding API latency): `moltbot memory search --max-results 5 --json "moltbot"` went from >60s to ~3.5s ## Testing - `pnpm lint` - `pnpm test` - `pnpm build` - Manual spot-check against a real ~166k-chunk store ## AI-assisted disclosure This PR was AI-assisted (OpenAI Codex CLI). - Testing level: fully tested (lint + full unit tests + build) + manual runtime check. - I reviewed/understand the change and am happy to iterate if you’d prefer a different query shape.  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR speeds up sqlite-vec-backed `memory_search` by switching from a full-table brute-force `ORDER BY vec_distance_cosine(...)` scan to a two-phase approach: use sqlite-vec KNN (`embedding MATCH ? AND k = ?`) to preselect candidates, then compute cosine distance and sort/limit those results. It also adds a focused unit test to assert the KNN query shape is used when the vector extension is available. <h3>Confidence Score: 3/5</h3> - Generally safe to merge, but the new KNN query shape may reduce recall when source filters are active. - The change is localized and test-covered for the SQL shape, but KNN candidate selection currently happens before applying source filters, which can exclude relevant results for the requested sources depending on data distribution. - src/memory/manager-search.ts  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>