← Back to PRs

#23312: fix(gateway): strip inbound metadata in chat history sanitization

by SidQin-cyber open 2026-02-22 06:07 View on GitHub →
app: web-ui gateway size: XS
## Summary - **Problem:** Webchat displays internal AI prompt metadata ("Conversation info (untrusted metadata): { message_id, sender }") to users instead of hiding it. - **Why:** The chat history sanitizer (\`sanitizeChatHistoryContentBlock\` / \`sanitizeChatHistoryMessage\`) only stripped directive tags (\`[[reply_to_current]]\` etc.) but not inbound metadata blocks. While \`stripEnvelopeFromMessages\` handles metadata upstream in the \`chat.history\` handler, messages reaching the sanitizer through other paths (e.g. broadcast, inject) could leak metadata. - **What changed:** Added \`stripInboundMetadata\` calls in \`sanitizeChatHistoryContentBlock\` and \`sanitizeChatHistoryMessage\` for all text-bearing fields (\`text\`, \`content\`). This provides defense-in-depth: metadata is now stripped at both the envelope layer and the per-block sanitization layer. - **What did NOT change:** The upstream \`stripEnvelopeFromMessages\` call in \`chat.history\` is unchanged. The \`stripInboundMetadata\` function itself is unchanged. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [x] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [x] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #23273 - Related: #22686 (same root cause) ## User-visible / Behavior Changes - "Conversation info (untrusted metadata)" blocks no longer appear in webchat messages ## Security Impact (required) - New permissions/capabilities? \`No\` - Secrets/tokens handling changed? \`No\` - New/changed network calls? \`No\` - Command/tool execution surface changed? \`No\` - Data access scope changed? \`No\` — This prevents internal metadata from leaking to users, which is a positive security improvement. ## Repro + Verification ### Steps 1. Open Control UI / Webchat 2. Send a message 3. Check chat history for metadata blocks ### Expected - No "Conversation info (untrusted metadata):" visible in chat ### Actual (before fix) - Metadata block appears in rendered messages ## Evidence \`buildInboundUserContextPrefix\` in \`inbound-meta.ts\` generates these blocks for the LLM. \`stripInboundMetadata\` in \`strip-inbound-meta.ts\` matches and removes them using sentinel strings. Previously only called at the \`stripEnvelopeFromMessages\` layer, now also called in per-block sanitization. ## Human Verification (required) - Verified scenarios: \`stripInboundMetadata\` correctly matches and removes all sentinel-prefixed blocks - Edge cases checked: Messages without metadata (fast-path returns unchanged), mixed content arrays - What I did **not** verify: Live webchat testing ## Compatibility / Migration - Backward compatible? \`Yes\` - Config/env changes? \`No\` - Migration needed? \`No\` ## Failure Recovery (if this breaks) \`stripInboundMetadata\` has a fast-path regex check — if no sentinels are found, it returns the original string unchanged (zero allocation). If it incorrectly strips user content, the upstream \`stripEnvelopeFromMessages\` call can be relied upon as the primary sanitizer. ## Risks and Mitigations Low risk. \`stripInboundMetadata\` is an existing, battle-tested function already used in the envelope stripping path. This change simply invokes it at an additional layer. <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR adds defense-in-depth metadata stripping by calling `stripInboundMetadata` in the chat history sanitization functions (`sanitizeChatHistoryContentBlock` and `sanitizeChatHistoryMessage`). This prevents internal AI prompt metadata from leaking to webchat when messages reach the sanitizer through paths other than `chat.history` (e.g. broadcast, inject). **Critical Issue Found:** - The change detection logic on lines 112, 169, and 183 compares `metaStripped !== entry.text` **after** `entry.text` has already been reassigned to the final processed value. This will cause the function to incorrectly return the original unmodified object when only metadata stripping occurs (without directive tags or truncation), defeating the purpose of this security fix. - The bug means metadata could still leak in edge cases where `stripInboundMetadata` changes the text but `stripInlineDirectiveTagsForDisplay` and `truncateChatHistoryText` do not. **What's Good:** - The approach is sound: adding `stripInboundMetadata` as an additional layer is the right defense-in-depth strategy - `stripInboundMetadata` itself is well-tested and has fast-path optimization <h3>Confidence Score: 1/5</h3> - This PR has a critical logic bug that defeats the security fix it's trying to implement - The change detection logic compares the metadata-stripped text against the already-mutated entry field (after it's been reassigned to the final value), which means when only metadata stripping occurs, the function will incorrectly return the original unmodified object. This defeats the purpose of this security hardening PR - metadata could still leak to users in cases where stripInboundMetadata removes content but stripInlineDirectiveTagsForDisplay and truncateChatHistoryText don't modify anything. - src/gateway/server-methods/chat.ts requires immediate attention - the change detection logic must be fixed before merge <sub>Last reviewed commit: 19c5f77</sub> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs