← Back to PRs

#8048: Media: add regression test for audio text blocks (#7970)

by Abhishek-B-R open 2026-02-03 14:22 View on GitHub →
stale
#### fixes https://github.com/openclaw/openclaw/issues/7970 ### Problem Audio files (e.g., OGG voice messages) can be incorrectly included as <file mime="text/plain"> blocks in the message body when looksLikeUtf8Text() returns true for compressed audio files. Some OGG files have enough bytes in the printable ASCII range (32-126) to pass the >85% threshold, causing binary content to be sent to the model as text, wasting tokens and causing confusion. The root cause was in extractFileBlocks: the check if (!forcedTextMimeResolved && kind === "audio" && !textLike) allowed audio files that "looked like" text to fall through to file extraction, even though audio transcription handles audio separately. ### Solution Added a regression test that ensures audio files are never treated as text file blocks, regardless of their binary content. The test verifies that an OGG file with CSV-like bytes (which would pass looksLikeUtf8Text()) is correctly skipped and does not produce a <file> block. Note: The production code in apply.ts already implements the correct behavior (skipping audio/image/video unless explicitly forced to text by filename/path). This PR adds test coverage to prevent regressions. ### File Changes - src/media-understanding/apply.test.ts: Added test case "never treats text-like audio as a file block when audio understanding is disabled" that exercises the exact scenario from issue #7970 and asserts audio files are never included as text blocks. <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> Adds a regression test in `src/media-understanding/apply.test.ts` to ensure audio attachments (e.g., OGG) are never emitted as `<file mime="text/plain">` blocks even if their bytes “look like” UTF-8 text, specifically when audio understanding is disabled (issue #7970). This strengthens the media-understanding pipeline by locking in the intended behavior: binary media (audio/image/video) should be skipped from file-block extraction unless explicitly forced to text by name/path heuristics. <h3>Confidence Score: 4/5</h3> - This PR is safe to merge; it only adds a targeted regression test. - Change is isolated to a new test case and matches existing behavior expectations; the only notable issue is ongoing temp directory cleanup in tests, which can create artifacts over time but doesn’t affect production code. - src/media-understanding/apply.test.ts (temp dir cleanup pattern) <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs