#18219: fix: validate base64 image data before sending to LLM APIs
agents
size: S
## Summary
Adds strict base64 validation in `sanitizeContentBlocksImages()` to prevent invalid base64 data from crashing sessions when sent to LLM APIs.
## Problem
When an image content block contains invalid base64 data, the Anthropic API rejects the request:
```
LLM request rejected: messages.116.content.1.image.source.base64: invalid base64 data
```
The session becomes **permanently broken** because the corrupted content is persisted in the session JSONL and replayed on every subsequent API call. Node.js `Buffer.from(s, 'base64')` silently ignores invalid characters, so the existing sanitization pipeline doesn't catch the issue before it hits the API.
## Changes
**`src/agents/tool-images.ts`:**
- Add `isStrictBase64()` — RFC 4648 §4 compliant validator (correct charset + padding)
- Add `stripDataUrlPrefix()` — strips `data:image/...;base64,` prefixes that some code paths may leave in the data field
- Validate base64 strictly in `sanitizeContentBlocksImages()` and log+omit invalid blocks gracefully (replaced with a text placeholder) instead of passing them to the API
- Use detected MIME type from data URL prefix when available
**`src/agents/tool-images.e2e.test.ts`:**
- Test: invalid base64 data is rejected gracefully (replaced with text block)
- Test: data URL prefixes are stripped and image is processed normally
- Test: empty/whitespace-only data is handled
## Why this matters
This is defense-in-depth. The upstream `@mariozechner/pi-ai` Anthropic provider has a catch-all else clause in `convertContentBlocks()` and `convertMessages()` that treats any non-text block as an image without validating the data field. Once bad data slips through, the session is stuck in a permanent 400 error loop.
Fixes #18212
Related: #11475 (session stuck in permanent 400 error loop)
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Added strict RFC 4648 base64 validation to prevent invalid image data from crashing sessions when sent to LLM APIs
- Implemented `isStrictBase64()` validator to catch malformed base64 before API submission
- Added `stripDataUrlPrefix()` to handle data URL prefixes that may be present in image blocks
- Invalid base64 blocks are now gracefully replaced with text placeholders instead of causing permanent session failures
- Comprehensive test coverage for invalid base64, data URL stripping, and empty data edge cases
<h3>Confidence Score: 4/5</h3>
- Safe to merge with one minor consideration about whitespace handling
- The implementation correctly solves the stated problem of preventing invalid base64 from reaching the API. The validation logic is sound, test coverage is comprehensive, and the defensive approach (replacing bad data with text placeholders) prevents session corruption. One style suggestion about RFC 4648 whitespace handling prevents the score from being a 5, but this is a minor enhancement rather than a blocking issue.
- No files require special attention - the implementation is straightforward and well-tested
<sub>Last reviewed commit: 298c909</sub>
<!-- greptile_other_comments_section -->
<sub>(5/5) You can turn off certain types of comments like style [here](https://app.greptile.com/review/github)!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#9598: fix(agents): check base64 string length against 5MB API limit
by BlockBB · 2026-02-05
82.1%
#8172: fix(sessions_list): strip base64 image data to prevent context over...
by Flamrru · 2026-02-03
76.1%
#23639: fix(agents): stop re-resizing session history images on every turn ...
by yinghaosang · 2026-02-22
75.8%
#2958: fix(media): wire tools.media.image.maxBytes config to image processin…
by shamsulalam1114 · 2026-01-27
75.5%
#5817: fix: strip old images during compaction to prevent 413 session bloat
by jduartedj · 2026-02-01
75.2%
#20913: fix: intercept Discord embed images to enforce mediaMaxMb
by MumuTW · 2026-02-19
74.8%
#23662: fix: cache sanitized images to avoid redundant re-processing per turn
by davidemanuelDEV · 2026-02-22
74.7%
#23706: perf: cache image resize results to avoid redundant processing (#23...
by echoVic · 2026-02-22
74.5%
#14328: fix: strip incomplete tool_use blocks from errored/aborted messages...
by Kropiunig · 2026-02-12
74.5%
#8076: fix(web): handle data URLs in loadWebMedia to prevent ENAMETOOLONG
by batumilove · 2026-02-03
74.4%