← Back to PRs

#9026: fix(session-memory): sanitize content to prevent binary data in memory files

by Flamrru open 2026-02-04 19:33 View on GitHub →
stale
## Problem When session content contains embedded file attachments (audio, images) via `<file>` tags, the binary data was being written directly to memory files, causing: 1. **Massive file sizes** (500KB+ for a simple session) 2. **Context overflow** when the files were later read 3. **Corrupted memory files** with invalid UTF-8 ### Root Cause The `session-memory` hook extracts conversation content from session JSONL files and writes it to `memory/*.md` files. When voice messages or images were processed, their binary data was embedded in `<file>` tags and passed through unfiltered. ## Solution Added `sanitizeForMemory()` function that strips: - `<file>...</file>` tags (embedded audio/image binary data) - Base64 image data URIs (`data:image/...`) - Long base64-like sequences (>500 chars) - Control characters (except newline/tab) The sanitization runs **before** writing to memory files, preserving readable conversation text while removing binary blobs. ## Testing - Added 8 new unit tests for `sanitizeForMemory()` - All 17 tests in `handler.test.ts` pass ## Related This is related to #3160 (`sessions_list` image stripping) - fixes the same class of bug in a different code path: - **#3160**: `sessions_list` tool output → fixed with `stripImageData()` - **This PR**: `session-memory` hook → fixed with `sanitizeForMemory()` Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adds a `sanitizeForMemory()` step to the `session-memory` hook to strip embedded `<file>...</file>` blobs, base64 image data URIs, long base64-like sequences, and control characters before writing session content into `memory/*.md`. It also adds unit tests covering these sanitization behaviors. This fits into the existing `session-memory` flow by sanitizing the extracted recent user/assistant message text right after `getRecentSessionContent()` and before slug generation + memory file write, preventing large/corrupted memory files when sessions contain attachment payloads. <h3>Confidence Score: 4/5</h3> - Mostly safe to merge, but needs a small type/contract fix in sanitizeForMemory. - Core sanitization logic is localized and covered by new tests, but the current implementation/test suite codifies returning nullish values despite a `string` return type, which can introduce type-unsound behavior for future callers. - src/hooks/bundled/session-memory/handler.ts; src/hooks/bundled/session-memory/handler.test.ts <!-- greptile_other_comments_section --> <sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub> **Context used:** - Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8)) - Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13)) <!-- /greptile_comment -->

Most Similar PRs