#21816: Add configurable `dimensions` for embedding models (Matryoshka support)
agents
size: S
Cluster:
Memory Database Enhancements
# Add configurable `dimensions` for embedding models (Matryoshka support)
## Summary
- Problem: No way to control embedding vector dimensions โ all providers use their native defaults (e.g. 3072 for `text-embedding-3-large`), leading to larger-than-necessary vector tables
- Why it matters: Users on constrained hardware (Pi, cheap VPS) or running many agents pay a storage/search cost for dimensions they don't need. Matryoshka-trained models retain quality at lower dims
- Why it matters: Users can also run higher-quality models more efficiently โ e.g. `text-embedding-3-large` at 1024 dims still outperforms `text-embedding-3-small` at its native 1536, with smaller storage and faster search (although at ~6.5x token cost) allowing users to make their own tradeoffs of size, cost, performance
- What changed: Added optional `dimensions` config field to `memorySearch`, passed through to OpenAI/Voyage/Gemini APIs with provider-specific parameter names. Added dimension mismatch detection to trigger full reindex. Stores `configuredDims` in index meta so removing the setting also reindexes back to native
- What did NOT change: Behaviour when `dimensions` is omitted is identical to before. No changes to chunking, FTS, or query logic
First time contributor ๐ โ I've been running OpenClaw daily as a power user (9 agents, 200+ memory files) and wanted to contribute back.
## Change Type (select all)
- [ ] Bug fix
- [x] Feature
- [ ] Refactor
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [ ] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [x] Memory / storage
- [ ] Integrations
- [ ] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra
- Related: Discussion with @vignesh07 on Discord (memory subsystem)
## User-visible / Behavior Changes
- New optional config field: `memorySearch.dimensions` (positive integer)
- When set, embedding API calls include the dimension parameter
- Changing or removing `dimensions` triggers a full memory reindex on next sync
- Config example:
```json
{
"memorySearch": {
"provider": "openai",
"model": "text-embedding-3-large",
"dimensions": 1024
}
}
```
## Security Impact (required)
- New permissions/capabilities? No
- Secrets/tokens handling changed? No
- New/changed network calls? No (same API endpoints, one additional optional parameter in request body)
- Command/tool execution surface changed? No
- Data access scope changed? No
## Repro + Verification
### Environment
- OS: macOS 14 (arm64)
- Runtime: Node 22.22.0
- Model/provider: OpenAI `text-embedding-3-large` and `text-embedding-3-small`
- Relevant config: `memorySearch.dimensions: 1024`, then `512`
### Steps
1. Set `"dimensions": 1024` in `memorySearch` config
2. Restart gateway
3. Trigger a memory search (or wait for heartbeat sync)
4. Check vec table: `sqlite3 memory.sqlite ".schema chunks_vec"` โ should show `FLOAT[1024]`
### Expected
- Vec table created at configured dimensions
- Full reindex triggered on dimension change
- Memory search returns results
### Actual
- All confirmed working across 9 agents, 200+ files
## Evidence
- Tested dimension changes: 1536 โ 1024 โ 512 (all clean, no errors)
- Tested model + dimension change together (`3-large@1024` โ `3-small@512`)
- 216 files reindexed in ~30 seconds
- Query embeddings confirmed using configured dimensions
## Human Verification (required)
- Verified scenarios: Dimension change reindex, model change reindex, combined model+dimension change, search quality at reduced dims
- Edge cases checked: Removing `dimensions` from config (triggers reindex via `configuredDims` in meta), `dimensions` unset on fresh install (no spurious reindex), cache lookup filters by dims when set
- What I did **not** verify: Voyage and Gemini providers (tested config pass-through in code, not live API calls). Local (llama-cpp) provider (no-op by design)
## Compatibility / Migration
- Backward compatible? Yes โ `dimensions` is optional, omitting it preserves existing behaviour
- Config/env changes? One new optional field (`memorySearch.dimensions`)
- Migration needed? No โ existing setups work unchanged. Setting `dimensions` triggers automatic reindex
## Failure Recovery (if this breaks)
- How to disable/revert: Remove `dimensions` from config, restart gateway. Reindex triggers automatically back to native dimensions
- Known bad symptoms: `Dimension mismatch for inserted vector` errors during sync (would indicate the `ensureVectorReady` path isn't receiving dimensions โ fixed in this PR)
## Risks and Mitigations
- Risk: User sets dimensions on a provider that doesn't support it (e.g. a custom OpenAI-compatible endpoint)
- Mitigation: Dimensions are passed as an optional API parameter โ most endpoints ignore unknown fields. If they error, the error message from the provider will surface in logs
- Risk: Hot-reload updates config but doesn't re-instantiate embedding provider
- Mitigation: This is existing behaviour for all `memorySearch` fields โ gateway restart required. Not introduced by this PR
## What changed (detail)
**Config & types** (4 files)
- `zod-schema.agent-runtime.ts` โ added optional `dimensions` to `MemorySearchSchema`
- `types.tools.ts` โ added `dimensions` to `EmbeddingProviderOptions`
- `schema.labels.ts` โ added config label
- `agents/memory-search.ts` โ passes dimensions through to embedding calls
**Embedding providers** (3 files) โ each maps `dimensions` to the provider-specific API parameter:
- OpenAI โ `dimensions`
- Voyage โ `output_dimension`
- Gemini โ `outputDimensionality`
**Manager & sync** (3 files + 1 type)
- `embeddings.ts` โ added `dimensions` to `EmbeddingProviderOptions` type
- `manager.ts` โ passes configured dimensions to provider init
- `manager-embedding-ops.ts` โ passes dimensions to `ensureVectorReady()` so vec tables are created at the right size; cache lookup filters by dims when configured
- `manager-sync-ops.ts` โ passes dimensions to `ensureVectorReady()` in the sync path + adds dimension mismatch detection to `needsFullReindex`. Stores `configuredDims` in index meta (distinct from the existing `vectorDims` which tracks actual vector size โ `configuredDims` tracks the user's config value, so removing `dimensions` from config can be detected and triggers a reindex back to native dimensions)
**11 files changed, ~53 lines added**
## AI Disclosure
AI-assisted (Claude via OpenClaw). Fully tested on a live multi-agent install. I understand what every line does โ the implementation was collaborative, not blind generation.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Added optional `dimensions` config field to `memorySearch` for Matryoshka embedding support. When set, OpenAI, Voyage, and Gemini providers request truncated vectors at the specified dimensionality. Changed dimension tracking logic to properly detect mismatches and trigger reindexes โ stores both `configuredDims` (user config value) and `vectorDims` (actual vector size) in index metadata, so removing `dimensions` from config triggers reindex back to native dimensions. Cache lookup filters by dimensions to prevent dimension mismatches during sync.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk
- The implementation is straightforward, well-tested by the author across 9 agents with 200+ files, and follows a defensive pattern (optional parameter that defaults to existing behavior). The dimension mismatch detection is comprehensive, correctly tracking both configured and actual dimensions to trigger reindexes when needed. All three remote providers map dimensions to their respective API parameters correctly. The cache lookup properly filters by dimensions to prevent stale embeddings from being used.
- No files require special attention
<sub>Last reviewed commit: 55013c8</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#20771: feat(memory-lancedb): support custom OpenAI-compatible embedding pr...
by marcodelpin ยท 2026-02-19
80.0%
#17566: memory-lancedb: support local OpenAI-compatible embeddings
by lumenradley ยท 2026-02-15
76.9%
#20149: fix(memory): expose index concurrency as config option
by togotago ยท 2026-02-18
76.2%
#10550: feat(memory-lancedb): local embeddings via node-llama-cpp
by namick ยท 2026-02-06
75.4%
#19006: feat(memory-lancedb): OpenAI-compatible baseUrl + Ollama provider +...
by martinsen-assistant ยท 2026-02-17
75.3%
#17874: feat(memory-lancedb): Custom OpenAI BaseURL & Dimensions Support
by rish2jain ยท 2026-02-16
74.9%
#11179: fix(memory): replace confusing "No API key" errors in memory tools ...
by liuxiaopai-ai ยท 2026-02-07
74.7%
#19920: fix(memory): populate FTS index in FTS-only mode so search returns ...
by forketyfork ยท 2026-02-18
74.7%
#18204: feat(memory): add native Telnyx embedding provider
by aisling404 ยท 2026-02-16
73.3%
#19100: Fix memory vector store dimension mismatch by resetting index
by Clawborn ยท 2026-02-17
72.7%