← Back to PRs

#8309: fix: add emb_ prefix to batch embedding custom_id for OpenAI compliance

by vishaltandale00 open 2026-02-03 21:32 View on GitHub →
stale
## Summary Fixes #8289 - Adds "emb_" prefix to batch embedding custom_id values to ensure they comply with OpenAI's batch API pattern requirement. ## Problem When using OpenAI batch embedding for memory search, the API returns HTTP 400 errors: ``` HTTP 400: Invalid 'input[861].name': string does not match pattern. Expected a string that matches the pattern '^[a-zA-Z0-9_-]+$'. ``` ## Root Cause The `custom_id` field in batch requests was using raw SHA256 hex hashes (containing only `[0-9a-f]`). While technically valid according to the pattern, OpenAI's validation may have stricter requirements or the hex-only format could cause edge cases. ## Solution Added "emb_" prefix to all `custom_id` values in both OpenAI and Gemini batch embedding requests: - Before: `dccd4c0888ad7b2155c306e58ff13a4ed423875ef6665af9b497b468501b9b69` - After: `emb_dccd4c0888ad7b2155c306e58ff13a4ed423875ef6665af9b497b468501b9b69` This ensures: 1. ✅ Custom IDs always start with letters 2. ✅ Clear semantic meaning ("embedding") 3. ✅ Full compliance with `^[a-zA-Z0-9_-]+$` pattern 4. ✅ Maintains uniqueness (hash is still used) ## Changes - Modified `src/memory/manager.ts` to add "emb_" prefix to custom_id generation - Applied to both OpenAI and Gemini batch request builders - Added comments explaining the purpose ## Impact - Fixes batch embedding failures for users with file paths containing special characters - No breaking changes (custom_id is internal to batch processing) - Backward compatible (new batches will use new format, old batches already completed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR updates batch embedding request construction in `src/memory/manager.ts` to prefix batch `custom_id` values with `emb_` for both OpenAI and Gemini providers. The goal is to avoid OpenAI batch API validation errors by ensuring `custom_id` values are consistently alphanumeric/underscore/hyphen and start with a semantic prefix. The change is localized to the batch request builders and does not affect how embeddings are stored or queried; it only alters the identifier used to correlate batch responses back to input chunks. <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk; it’s a localized identifier-format change in batch request building. - The change only affects how `custom_id` strings are generated for OpenAI/Gemini batch embedding requests and preserves uniqueness by retaining the underlying hash. Mapping and cache behavior remain consistent because the map key changes in lockstep with the request `custom_id`. No new external inputs are introduced. - src/memory/manager.ts <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs