← Back to PRs

#18219: fix: validate base64 image data before sending to LLM APIs

by Grynn open 2026-02-16 16:26 View on GitHub →
agents size: S
## Summary Adds strict base64 validation in `sanitizeContentBlocksImages()` to prevent invalid base64 data from crashing sessions when sent to LLM APIs. ## Problem When an image content block contains invalid base64 data, the Anthropic API rejects the request: ``` LLM request rejected: messages.116.content.1.image.source.base64: invalid base64 data ``` The session becomes **permanently broken** because the corrupted content is persisted in the session JSONL and replayed on every subsequent API call. Node.js `Buffer.from(s, 'base64')` silently ignores invalid characters, so the existing sanitization pipeline doesn't catch the issue before it hits the API. ## Changes **`src/agents/tool-images.ts`:** - Add `isStrictBase64()` — RFC 4648 §4 compliant validator (correct charset + padding) - Add `stripDataUrlPrefix()` — strips `data:image/...;base64,` prefixes that some code paths may leave in the data field - Validate base64 strictly in `sanitizeContentBlocksImages()` and log+omit invalid blocks gracefully (replaced with a text placeholder) instead of passing them to the API - Use detected MIME type from data URL prefix when available **`src/agents/tool-images.e2e.test.ts`:** - Test: invalid base64 data is rejected gracefully (replaced with text block) - Test: data URL prefixes are stripped and image is processed normally - Test: empty/whitespace-only data is handled ## Why this matters This is defense-in-depth. The upstream `@mariozechner/pi-ai` Anthropic provider has a catch-all else clause in `convertContentBlocks()` and `convertMessages()` that treats any non-text block as an image without validating the data field. Once bad data slips through, the session is stuck in a permanent 400 error loop. Fixes #18212 Related: #11475 (session stuck in permanent 400 error loop) <!-- greptile_comment --> <h3>Greptile Summary</h3> Added strict RFC 4648 base64 validation to prevent invalid image data from crashing sessions when sent to LLM APIs - Implemented `isStrictBase64()` validator to catch malformed base64 before API submission - Added `stripDataUrlPrefix()` to handle data URL prefixes that may be present in image blocks - Invalid base64 blocks are now gracefully replaced with text placeholders instead of causing permanent session failures - Comprehensive test coverage for invalid base64, data URL stripping, and empty data edge cases <h3>Confidence Score: 4/5</h3> - Safe to merge with one minor consideration about whitespace handling - The implementation correctly solves the stated problem of preventing invalid base64 from reaching the API. The validation logic is sound, test coverage is comprehensive, and the defensive approach (replacing bad data with text placeholders) prevents session corruption. One style suggestion about RFC 4648 whitespace handling prevents the score from being a 5, but this is a minor enhancement rather than a blocking issue. - No files require special attention - the implementation is straightforward and well-tested <sub>Last reviewed commit: 298c909</sub> <!-- greptile_other_comments_section --> <sub>(5/5) You can turn off certain types of comments like style [here](https://app.greptile.com/review/github)!</sub> <!-- /greptile_comment -->

Most Similar PRs