#20301: Security: scrub untrusted metadata from user-facing replies

by ashishc2503 open 2026-02-18 19:14 View on GitHub →

channel: whatsapp-web gateway agents size: S

Cluster: Metadata Sanitization and Security Fixes

## Summary - add a shared scrubber to remove untrusted metadata JSON blocks from user-facing text - apply the scrubber in `sanitizeUserFacingText` and in gateway history sanitization for user messages - harden WhatsApp media-failure fallback copy to avoid exposing raw internal error messages ## Changes - Added `src/shared/untrusted-metadata.ts` - Updated `src/agents/pi-embedded-helpers/errors.ts` - Updated `src/gateway/chat-sanitize.ts` - Updated `src/web/auto-reply/deliver-reply.ts` - Added/updated regression tests for all affected paths ## Verification - `pnpm test -- src/gateway/chat-sanitize.test.ts` - `pnpm test -- src/auto-reply/reply/reply-utils.test.ts` - `pnpm test -- src/web/auto-reply/deliver-reply.test.ts` - `pnpm vitest run --config vitest.e2e.config.ts src/agents/pi-embedded-helpers.sanitizeuserfacingtext.e2e.test.ts`  <h3>Greptile Summary</h3> Adds a shared `stripUntrustedMetadataBlocks` scrubber that removes internal metadata JSON blocks (conversation info, sender info, thread context, etc.) from user-facing text, preventing untrusted envelope metadata from leaking into replies. The scrubber is integrated into both `sanitizeUserFacingText` (for outbound reply normalization) and the gateway history sanitization path (for user messages sent to the LLM). Additionally hardens the WhatsApp media-failure fallback to use a static message instead of exposing raw `err.message`. - New `src/shared/untrusted-metadata.ts` with `stripUntrustedMetadataBlocks` that matches all 6 metadata header types generated by `buildInboundUserContextPrefix` in `inbound-meta.ts` - Integrated into `sanitizeUserFacingText` in `errors.ts` to strip metadata from outbound replies - Integrated into `chat-sanitize.ts` to strip metadata from user messages before they reach the LLM context - WhatsApp media-failure fallback now uses a static `"⚠️ Media failed. Sending text only."` instead of interpolating `err.message` - Regression tests added across all affected paths <h3>Confidence Score: 5/5</h3> - This PR is safe to merge — it adds defensive stripping of internal metadata and hardens error messages with no behavioral regressions. - The changes are focused and well-scoped: a new shared utility with clear, testable logic; straightforward integration at three call sites; and a simple hardening fix for the WhatsApp fallback. The metadata header list exactly matches the generation source in inbound-meta.ts. All edge cases (missing closing fence, multi-line JSON, residual blank lines) are handled correctly. Comprehensive regression tests cover all affected paths. No logic errors, security issues, or regressions identified. - No files require special attention. <sub>Last reviewed commit: fb14d8a</sub>  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>