#21074: security(web_fetch): strip hidden content to prevent indirect prompt injection
agents
size: M
Cluster:
Web Search Provider Enhancements
## Problem
`web_fetch` extracts content from HTML pages into the agent's context. Hidden elements — invisible to humans but present in extracted text — create an indirect prompt injection vector. See #8027 for the full description.
I found several gaps while [reviewing PR #8114](https://github.com/openclaw/openclaw/pull/8114#pullrequestreview-2702612959) which addresses the same issue. This PR takes a standalone approach with broader coverage of real-world hiding techniques.
## What this PR adds
A sanitization layer that strips human-invisible content before Readability processes the HTML.
### Detection vectors
**CSS inline styles:**
- `display:none`, `visibility:hidden`, `opacity:0`, `font-size:0`
- `text-indent:-9999px`, offscreen positioning (`left/top:-9999px`)
- `color:transparent`, `color:rgba(r,g,b,0)`, `color:hsla(h,s,l,0)`
- `transform:scale(0)`, `transform:translateX/Y(-9999px)`
- `clip-path:inset(100%)`, `width:0;height:0;overflow:hidden`
**CSS class-based hiding** (the most common real-world pattern):
- `.sr-only`, `.visually-hidden`, `.d-none`, `.hidden`, `.invisible`, `.screen-reader-only`, `.offscreen`
- Uses Set-based token matching (split on whitespace) to avoid false positives on compound class names like `un-hidden`
**HTML attributes:** `aria-hidden="true"`, `hidden`, `input[type=hidden]`
**Non-content tags:** `meta`, `template`, `svg`, `canvas`, `iframe`, `object`, `embed`
**Invisible Unicode:** zero-width characters (U+200B-U+200F), directional overrides (U+202A-U+202E), formatting chars (U+2060-U+2064, U+206A-U+206F), BOM (U+FEFF), Unicode tag block (U+E0000-U+E007F)
**HTML comments**
### Differences from #8114
- **Class-based hiding** — #8114 only checks inline styles. This PR detects common CSS framework classes (Bootstrap, Tailwind, accessibility utilities)
- **`color:transparent` / `rgba(r,g,b,0)`** — not covered in #8114
- **`transform:translateX/Y(-9999px)`** — offscreen via transform, not just `position:absolute` + `left`
- **`<meta>` tag stripping** — prevents injection via meta content attributes
- **Lazy linkedom import** — uses `await import("linkedom")` consistent with the existing lazy-loading pattern in `web-fetch-utils.ts`, avoiding eager double-imports
- **Set-based class matching** — avoids regex word-boundary false positives
### Files changed
- **`web-fetch-visibility.ts`** (new) — `sanitizeHtml()` and `stripInvisibleUnicode()`
- **`web-fetch-utils.ts`** (modified) — integrates sanitization before Readability, unicode stripping on text output
- **`web-fetch-visibility.test.ts`** (new) — 35 tests covering all detection vectors
### Design decisions
- Uses linkedom (existing dependency) for DOM parsing — no new deps
- `sanitizeHtml` is async with lazy import, matching codebase conventions
- Bottom-up DOM traversal to avoid re-walking removed subtrees
- Class-based detection uses `Set` with whitespace-split tokens (no regex word boundary issues)
- `stripInvisibleUnicode` runs on final text output to catch anything that survives HTML processing
Closes #8027
Most Similar PRs
#19675: fix(security): prevent zero-width Unicode chars from bypassing boun...
by williamzujkowski · 2026-02-18
70.8%
#20423: fix(web-fetch): cap htmlToMarkdown input size to prevent catastroph...
by Limitless2023 · 2026-02-18
69.7%
#19042: Security: add URL allowlist for web_search and web_fetch
by smartprogrammer93 · 2026-02-17
68.2%
#13012: Security: detect invisible Unicode in skills and plugins (ASCII smu...
by agentwuzzi · 2026-02-10
66.4%
#20164: fix(webchat): strip reply directive tags before rendering assistant...
by Limitless2023 · 2026-02-18
65.7%
#15251: feat(web-fetch): send Accept: text/markdown header for Cloudflare M...
by wujieli0207 · 2026-02-13
65.6%
#15414: feat(web-fetch): add Accept: text/markdown header for Cloudflare Ma...
by aldoeliacim · 2026-02-13
65.4%
#8718: fix: sanitize download filenames to prevent path traversal (CWE-22)
by DevZenPro · 2026-02-04
65.4%
#16590: fix(web-fetch): use bot UA for markdown to enable Cloudflare LLM co...
by Imccccc · 2026-02-14
65.1%
#21861: fix: selective context gating for OWNER_ONLY privacy tags (#11900)
by Asm3r96 · 2026-02-20
65.1%