#13251: fix: Windows local embeddings race condition

by fayrose open 2026-02-10 07:28 View on GitHub →

stale size: S

## Problem Local embeddings timeout/hang on Windows when using node-llama-cpp. The process would hang indefinitely during model initialization, eventually timing out after 300s. ## Root Cause Race conditions in model initialization: 1. Concurrent calls to \embedBatch\ triggering parallel llama model loads 2. No locking around the lazy initialization of the embeddings provider 3. No locking around manager creation ## Fix 1. **Sequential \embedBatch\** - Process embedding requests sequentially rather than in parallel 2. **Initialization lock in \embeddings.ts\** - Mutex to prevent concurrent model loading 3. **Initialization lock in \manager.ts\** - Mutex to prevent concurrent manager creation ## Testing - Tested on Windows 11 (x64) - Node v22.22.0 - Model: embeddinggemma-300M-Q8_0.gguf via node-llama-cpp - Before: 100% timeout rate on memory_search - After: Memory search completes in 2-3 seconds ## Related - Addresses Windows-specific issues with local embedding provider - May also help Linux/Mac users experiencing similar race conditions  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR addresses a Windows hang in local embeddings by serializing initialization and embedding calls: - `src/memory/embeddings.ts` adds a per-provider init promise/error cache around `getLlama()`/model loading/context creation and changes `embedBatch()` to run sequentially to avoid concurrent `getEmbeddingFor()` calls. - `src/memory/manager.ts` adds a per-cache-key initialization lock to prevent concurrent `MemoryIndexManager` creation (and thus concurrent provider/model initialization). Overall the approach fits the existing lazy-init/cached-manager design: providers are created on demand and cached per agent/workspace/settings, with the new locks preventing concurrent initialization races that are especially problematic on Windows/node-llama-cpp. <h3>Confidence Score: 4/5</h3> - This PR is generally safe to merge once a small type-level issue in the manager init lock map is fixed. - The concurrency/locking changes are localized and consistent with the reported race-condition root cause. The main remaining issue is an incorrect `INDEX_INIT_LOCKS` promise type that can leak `null` into the `get()` fast-path return and/or force incorrect downstream typing. - src/memory/manager.ts