#10257: fix(security): anchor MIME sanitization regex and block fullwidth bypass (#9791, #9795)
stale
Cluster:
Media Handling Improvements
## Summary
- Anchor the MIME type sanitization regex with `$` to reject trailing content after a valid type/subtype pair
- Add NFKC Unicode normalization before validation to prevent fullwidth character bypasses (e.g., `audio/mpeg`)
- Apply the same normalization to `normalizeMimeType()` in `src/media/input-files.ts`
Fixes #9791, #9795
## Test plan
- [x] New tests for `sanitizeMimeType()` covering: standard types, fullwidth Unicode, trailing content, invalid values
- [x] New tests for `normalizeMimeType()` covering: charset stripping, fullwidth Unicode normalization
- [x] All 11 new tests pass
- [x] `pnpm check` passes (0 warnings, 0 errors)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
- Updates MIME sanitization in `src/media-understanding/apply.ts` to NFKC-normalize before validation and anchors the regex to reject trailing content.
- Exports `sanitizeMimeType()` and adds targeted Vitest coverage for standard types, whitespace/lowercasing, fullwidth Unicode normalization, and trailing-content rejection.
- Updates `normalizeMimeType()` in `src/media/input-files.ts` to also apply NFKC normalization, with tests covering parameter/charset stripping and fullwidth normalization.
- Change fits into the media ingestion pipeline by hardening MIME handling before allowing/denying extraction and before embedding MIME values into generated `<file ...>` blocks.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk.
- Changes are tightly scoped to MIME normalization/validation, include explicit regression tests for the intended security fixes (trailing-content rejection and fullwidth bypass prevention), and do not alter unrelated control flow in the media pipeline.
- No files require special attention
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#7454: fix: skip UTF-16 heuristic for audio/video/image MIME types (#7444)
by gavinbmoore · 2026-02-02
77.6%
#11443: LINE: fix buffer guards in detectContentType + add tests
by MdRahmatUllah · 2026-02-07
76.5%
#19675: fix(security): prevent zero-width Unicode chars from bypassing boun...
by williamzujkowski · 2026-02-18
76.1%
#19868: fix: prevent media token regex from matching markdown bold text
by sanketgautam · 2026-02-18
75.7%
#22088: fix(web): sanitize media errors to prevent PII leak
by ashiabbott · 2026-02-20
75.4%
#11160: Media: add missing audio MIME-to-extension mappings (aac, flac, opu...
by lailoo · 2026-02-07
75.1%
#18811: fix(media): require file extension for ambiguous MEDIA: path detection
by aldoeliacim · 2026-02-17
74.9%
#17286: fix(media): PDF attachments embedded as raw binary instead of extra...
by yinghaosang · 2026-02-15
74.8%
#16990: fix(media): strip auth headers on cross-origin redirect in download...
by AI-Reviewer-QS · 2026-02-15
73.5%
#9817: fix(media): resolve relative paths before reading local files (#8759)
by lailoo · 2026-02-05
73.4%