#8309: fix: add emb_ prefix to batch embedding custom_id for OpenAI compliance
stale
Cluster:
Gemini API Enhancements
## Summary
Fixes #8289 - Adds "emb_" prefix to batch embedding custom_id values to ensure they comply with OpenAI's batch API pattern requirement.
## Problem
When using OpenAI batch embedding for memory search, the API returns HTTP 400 errors:
```
HTTP 400: Invalid 'input[861].name': string does not match pattern. Expected a string that matches the pattern '^[a-zA-Z0-9_-]+$'.
```
## Root Cause
The `custom_id` field in batch requests was using raw SHA256 hex hashes (containing only `[0-9a-f]`). While technically valid according to the pattern, OpenAI's validation may have stricter requirements or the hex-only format could cause edge cases.
## Solution
Added "emb_" prefix to all `custom_id` values in both OpenAI and Gemini batch embedding requests:
- Before: `dccd4c0888ad7b2155c306e58ff13a4ed423875ef6665af9b497b468501b9b69`
- After: `emb_dccd4c0888ad7b2155c306e58ff13a4ed423875ef6665af9b497b468501b9b69`
This ensures:
1. ✅ Custom IDs always start with letters
2. ✅ Clear semantic meaning ("embedding")
3. ✅ Full compliance with `^[a-zA-Z0-9_-]+$` pattern
4. ✅ Maintains uniqueness (hash is still used)
## Changes
- Modified `src/memory/manager.ts` to add "emb_" prefix to custom_id generation
- Applied to both OpenAI and Gemini batch request builders
- Added comments explaining the purpose
## Impact
- Fixes batch embedding failures for users with file paths containing special characters
- No breaking changes (custom_id is internal to batch processing)
- Backward compatible (new batches will use new format, old batches already completed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR updates batch embedding request construction in `src/memory/manager.ts` to prefix batch `custom_id` values with `emb_` for both OpenAI and Gemini providers. The goal is to avoid OpenAI batch API validation errors by ensuring `custom_id` values are consistently alphanumeric/underscore/hyphen and start with a semantic prefix.
The change is localized to the batch request builders and does not affect how embeddings are stored or queried; it only alters the identifier used to correlate batch responses back to input chunks.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk; it’s a localized identifier-format change in batch request building.
- The change only affects how `custom_id` strings are generated for OpenAI/Gemini batch embedding requests and preserves uniqueness by retaining the underlying hash. Mapping and cache behavior remain consistent because the map key changes in lockstep with the request `custom_id`. No new external inputs are introduced.
- src/memory/manager.ts
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#8675: fix: Gemini batch embeddings state path, enum values, and download URL
by seasalim · 2026-02-04
80.9%
#5808: fix(memory): truncate oversized chunks before embedding
by douvy · 2026-02-01
79.1%
#21843: fix: add retry/backoff to Gemini embedding batch API calls
by slegarraga · 2026-02-20
75.9%
#15585: fix: add retry/backoff for Gemini embedding API calls
by WalterSumbon · 2026-02-13
74.7%
#16786: fix: support google-antigravity OAuth for Gemini embeddings
by outsourc-e · 2026-02-15
73.2%
#17701: fix(memory-lancedb): add gemini-embedding-001 and baseUrl support
by Phineas1500 · 2026-02-16
72.3%
#20315: fix(memory): add gemini-embedding-001 to GEMINI_MAX_INPUT_TOKENS
by Clawborn · 2026-02-18
72.3%
#7810: fix: add fetch timeouts to prevent memory indexing hangs (#4370)
by Kaizen-79 · 2026-02-03
71.9%
#15301: Feat/gemini overflow and tags
by divisonofficer · 2026-02-13
71.5%
#11179: fix(memory): replace confusing "No API key" errors in memory tools ...
by liuxiaopai-ai · 2026-02-07
71.4%