#19042: Security: add URL allowlist for web_search and web_fetch
agents
size: M
Cluster:
Web Search Provider Enhancements
Reopening after accidental merge and revert.
## Summary
- Problem: No way to restrict which URLs/domains `web_search` and `web_fetch` tools can access
- Why it matters: Users in controlled environments need to limit web access to approved domains only
- What changed: Added optional `urlAllowlist` config at `tools.web` level; web_fetch blocks non-matching URLs, web_search filters Brave results
- What did NOT change: Default behavior (no allowlist = all URLs allowed), Perplexity/Grok search providers (return prose, not filterable URLs)
## Change Type (select all)
- [ ] Bug fix
- [x] Feature
- [ ] Refactor
- [ ] Docs
- [x] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [ ] Gateway / orchestration
- [x] Skills / tool execution
- [ ] Auth / tokens
- [ ] Memory / storage
- [ ] Integrations
- [x] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Related # N/A
## User-visible / Behavior Changes
- New config option `tools.web.urlAllowlist` (string array, optional)
- Accepts exact domains (`example.com`) and wildcard patterns (`*.github.com`)
- `web_fetch`: returns `url_not_allowed` error for non-matching URLs
- `web_search`: filters Brave results to only matching domains
- When not configured: no change in behavior (fully backwards compatible)
Example config:
```json
{
"tools": {
"web": {
"urlAllowlist": ["example.com", "*.github.com"]
}
}
}
```
## Security Impact (required)
- New permissions/capabilities? `No` (restricts access, does not expand)
- Secrets/tokens handling changed? `No`
- New/changed network calls? `No`
- Command/tool execution surface changed? `Yes` — adds optional URL filtering before fetch/after search
- Risk: Misconfigured allowlist could block all web access
- Mitigation: Empty/undefined allowlist preserves current allow-all behavior
- Data access scope changed? `No`
## Repro + Verification
### Environment
- OS: Linux (Proxmox LXC)
- Runtime/container: Node v22
- Model/provider: Any
- Integration/channel: Any
- Relevant config: `tools.web.urlAllowlist: ["example.com", "*.github.com"]`
### Steps
1. Set `tools.web.urlAllowlist` in config
2. Use `web_fetch` with an allowed URL → succeeds
3. Use `web_fetch` with a blocked URL → returns `url_not_allowed` error
4. Use `web_search` → Brave results filtered to allowed domains only
### Expected
- Allowed URLs pass through normally
- Blocked URLs return clear error with list of allowed domains
### Actual
- As expected (verified via unit tests)
## Evidence
- [x] Failing test/log before + passing after
- 23 new unit tests in `web-tools.url-allowlist.test.ts`
## Human Verification (required)
- Verified scenarios: Unit tests for allowlist resolution, wildcard matching, fetch blocking, search result filtering
- Edge cases checked: Empty allowlist, undefined allowlist, URLs without hostname, wildcard patterns
- What you did **not** verify: End-to-end with live Brave/Perplexity/Grok APIs
## Compatibility / Migration
- Backward compatible? `Yes`
- Config/env changes? `Yes` — new optional `tools.web.urlAllowlist` field
- Migration needed? `No`
## Failure Recovery (if this breaks)
- How to disable/revert this change quickly: Remove `urlAllowlist` from config (or set to empty array)
- Files/config to restore: `tools.web` section in openclaw config
- Known bad symptoms: web_fetch returning unexpected `url_not_allowed` errors, web_search returning empty results
## Risks and Mitigations
- Risk: User sets allowlist and forgets, then wonders why web tools are restricted
- Mitigation: Error message clearly lists allowed domains
- Risk: Wildcard patterns could be confusing (`*.example.com` does NOT match `example.com` itself)
- Mitigation: Reuses existing well-tested hostname matching from SSRF module
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Adds an optional `tools.web.urlAllowlist` config field that restricts which domains `web_fetch` and `web_search` (Brave provider) can access. Key changes:
- Shared `resolveUrlAllowlist` utility in `web-shared.ts` reads the allowlist from config, used by both tools
- `web_fetch` blocks non-matching URLs before fetch and also validates redirect targets post-fetch
- `web_search` filters Brave results by hostname, stores unfiltered results in cache (applies filtering on read to handle config changes)
- Perplexity/Grok providers intentionally skip filtering (they return prose, not URL-bearing results)
- Domain pattern validation via Zod regex accepts exact domains, `*.` wildcards, and single-label domains like `localhost`
- Reuses existing SSRF hostname matching (`normalizeHostnameAllowlist` / `matchesHostnameAllowlist`) — two previously-private functions are now exported
- 23 new unit tests covering resolver, filtering, wildcard matching, unparseable URLs, and config integration
- Fully backwards compatible: no allowlist = allow-all behavior preserved
Previous review feedback has been addressed: shared resolver extracted, unfiltered results cached, count updated after filtering, redirect targets checked, and single-label domains supported.
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge — it adds an opt-in restriction feature with no impact on default behavior.
- The implementation is well-structured: it reuses existing SSRF hostname matching, correctly handles caching (stores unfiltered, filters on read), validates redirect targets, and has thorough test coverage. All previous review concerns have been addressed. The only reason this isn't a 5 is the inherent complexity of security-related URL filtering — while the logic is sound, the feature touches multiple code paths across fetch and search tools.
- `src/agents/tools/web-fetch.ts` and `src/agents/tools/web-search.ts` are the primary files where allowlist enforcement logic lives and deserve careful attention during final review.
<sub>Last reviewed commit: bfd27d5</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#22644: feat(web-fetch): add allowPrivateNetwork config for web_fetch
by qingxuecc · 2026-02-21
81.5%
#8715: fix(web-search): safer provider resolution & Perplexity auto-detection
by abhijeet117 · 2026-02-04
78.4%
#22505: Feature/clean grok search base url
by vacuityv · 2026-02-21
78.4%
#19314: feat: add Brave web_search baseUrl override (AI-assisted)
by mrutunjay-kinagi · 2026-02-17
78.2%
#13370: Tools: rewrite Grok parser, add Tavily provider, multi-provider con...
by a-anand-91119 · 2026-02-10
77.6%
#15923: feat: add proxy support for web_search tool
by Shuai-DaiDai · 2026-02-14
77.3%
#19525: security: add SSRF validation for external URLs
by Mozzzaic · 2026-02-17
76.9%
#18167: feat(web-search): add baseUrl support for Brave Search provider
by jkoprax · 2026-02-16
76.5%
#19298: feat(tools): add Brave LLM Context API mode for web_search
by RoccoFortuna · 2026-02-17
76.2%
#13843: feat(web-search): allow overriding Brave Search base URL
by strelov1 · 2026-02-11
76.1%