← Back to PRs

#15251: feat(web-fetch): send Accept: text/markdown header for Cloudflare Markdown for Agents

by wujieli0207 open 2026-02-13 06:40 View on GitHub →
agents stale size: S
## Summary Implements [#14999](https://github.com/openclaw/openclaw/issues/14999) — send `Accept: text/markdown` in `web_fetch` requests to leverage Cloudflare's [Markdown for Agents](https://blog.cloudflare.com/markdown-for-agents/) feature. ## Changes **`src/agents/tools/web-fetch.ts`** (3 changes): 1. **Accept header** — Changed from `Accept: */*` to `Accept: text/markdown, text/html;q=0.9, */*;q=0.8`. This is standard HTTP content negotiation; sites that don't support it ignore the preference and return HTML as usual. 2. **`markdown-native` extractor** — When the server returns `Content-Type: text/markdown`, the body is used directly (or converted to plain text when `extractMode="text"`), completely skipping Readability/HTML parsing. This is a new branch added before the existing `text/html` check. 3. **`x-markdown-tokens` logging** — When the response includes the `x-markdown-tokens` header (Cloudflare's estimated token count), it's logged to console for future token budget estimation. **`src/agents/tools/web-tools.fetch.test.ts`** (4 new tests): - `uses markdown-native extractor when server returns text/markdown` - `converts markdown to plain text when extractMode is text and server returns text/markdown` - `logs x-markdown-tokens header when present` - `sends Accept header preferring text/markdown` ## Why - **~80% token reduction** on supported sites (Cloudflare blog: 16,180 HTML tokens → 3,150 markdown) - **Better quality** — site owner controls the conversion, not a client-side heuristic - **Zero risk** — fully backward-compatible content negotiation - **Growing ecosystem** — Claude Code and OpenCode already send this header ## References - [Cloudflare announcement](https://blog.cloudflare.com/markdown-for-agents/) - [CF developer docs](https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/) - [Content Signals framework](https://contentsignals.org/) Closes #14999 <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR updates the `web_fetch` tool to prefer server-provided markdown by sending an `Accept: text/markdown` header, adds a fast-path extractor for `Content-Type: text/markdown`, and introduces test coverage for the new negotiation/extraction behavior (including `x-markdown-tokens` handling). The change fits into the existing fetch pipeline by keeping the SSRF-guarded fetch path intact, adding a new extraction branch ahead of the HTML/Readability logic, and continuing to wrap returned content with the external-content markers and maxChars enforcement. <h3>Confidence Score: 4/5</h3> - Generally safe to merge, with one logging issue to fix. - Core behavior change (Accept header + markdown-native extractor) is straightforward and covered by tests; the main concern is the newly introduced unconditional `console.log` which bypasses the repo’s logging controls and may produce noisy/unexpected output in normal runs. - src/agents/tools/web-fetch.ts (logging line for x-markdown-tokens) <sub>Last reviewed commit: 1c061fa</sub> <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs