#15639: fix(memory): serialize local embedding initialization to avoid duplicate model loads
stale
size: S
Cluster:
Memory Database Enhancements
## Summary
This PR fixes a race condition in local embeddings initialization by serializing `ensureContext()` in `createLocalEmbeddingProvider`.
- Add a cached `initPromise` in `src/memory/embeddings.ts` so concurrent callers share the same initialization path.
- Prevent duplicate `getLlama()` / `loadModel()` / `createEmbeddingContext()` calls under concurrent `embedBatch`.
- Add a regression test in `src/memory/embeddings.test.ts` (`local embedding ensureContext concurrency`) that verifies model/context initialization happens only once across 4 concurrent calls.
## Why
When indexing memory files concurrently, multiple calls can enter `ensureContext()` before the first initialization completes, causing repeated local model loads and instability (VRAM pressure / hangs depending on platform and model size).
Related: #7547
## Scope / Non-goals
- No behavior change to embedding normalization.
- No manager-level lock changes in `src/memory/manager.ts`.
- No serialization change to `embedBatch` (keeps current parallel behavior).
## Testing
- Added unit regression test:
- `src/memory/embeddings.test.ts`
- `local embedding ensureContext concurrency`
- Verifies `getLlama`, `loadModel`, and `createEmbeddingContext` are each called exactly once under concurrent `embedBatch`.
## AI Assistance
AI-assisted PR (drafting + analysis). I reviewed and understood the code and test behavior before submitting.
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR fixes a concurrency race in the local embeddings provider by memoizing local model/context initialization (`ensureContext`) so concurrent `embedBatch`/`embedQuery` calls share a single `getLlama()` → `loadModel()` → `createEmbeddingContext()` path. It also adds a regression test that mocks `node-llama-cpp` and verifies initialization happens once across multiple concurrent `embedBatch` calls.
Main concern: the new `initPromise` memoization needs a retry path on failure; as written, a single initialization error can permanently poison the provider instance with a rejected promise.
<h3>Confidence Score: 4/5</h3>
- Mostly safe, but has one retry/robustness bug in local embeddings init memoization.
- The concurrency fix is straightforward and scoped, and the regression test covers the intended race. However, caching `initPromise` without clearing it on rejection can leave the provider stuck in a permanent failure state after a transient init error, which should be addressed before merging.
- src/memory/embeddings.ts
<sub>Last reviewed commit: 2bbe896</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#13251: fix: Windows local embeddings race condition
by fayrose · 2026-02-10
85.4%
#21845: fix: use sequential embedding for local GGUF provider to prevent de...
by slegarraga · 2026-02-20
79.0%
#10550: feat(memory-lancedb): local embeddings via node-llama-cpp
by namick · 2026-02-06
77.7%
#17566: memory-lancedb: support local OpenAI-compatible embeddings
by lumenradley · 2026-02-15
74.8%
#20149: fix(memory): expose index concurrency as config option
by togotago · 2026-02-18
74.7%
#20771: feat(memory-lancedb): support custom OpenAI-compatible embedding pr...
by marcodelpin · 2026-02-19
74.4%
#20882: fix(memory): add gpu config option for local embeddings and surface...
by irchelper · 2026-02-19
74.3%
#19006: feat(memory-lancedb): OpenAI-compatible baseUrl + Ollama provider +...
by martinsen-assistant · 2026-02-17
73.9%
#12195: fix(agents): sync config fallback for lookupContextTokens cold-star...
by mcaxtr · 2026-02-09
73.6%
#11179: fix(memory): replace confusing "No API key" errors in memory tools ...
by liuxiaopai-ai · 2026-02-07
73.5%