← Back to PRs

#13894: feat(security): add manifest scanner for SKILL.md trust analysis

by jdrhyne open 2026-02-11 05:19 View on GitHub →
agents stale
## Summary Adds a new `manifest-scanner` module that complements the existing `skill-scanner` (JS/TS code analysis) with **content-level scanning of SKILL.md, AGENTS.md, and CLAUDE.md** files. Threat taxonomy and detection patterns adapted from **[AgentVerus Scanner](https://github.com/agentverus/agentverus-scanner)** (MIT license) — a comprehensive skill trust scoring system with 6 analysis categories and social reputation. ## Problem The existing skill-scanner ([PR #9806](https://github.com/openclaw/openclaw/pull/9806)) catches dangerous patterns in **executable code** (eval, child_process, exfiltration). But many attack vectors described in [Issue #11014](https://github.com/openclaw/openclaw/issues/11014) and [Cisco's research](https://blogs.cisco.com/ai/personal-ai-agents-like-openclaw-are-a-security-nightmare) target the **manifest/instruction text itself** — prompt injection, credential harvesting instructions, autonomy abuse, and Unicode steganography. Currently, a skill can contain `"Ignore all previous instructions"` or invisible zero-width characters hiding instructions in SKILL.md, and nothing flags it. ## What This PR Adds ### New Module: `src/security/manifest-scanner.ts` Scans manifest files for 8 threat categories: | Category | Severity | Example | |----------|----------|---------| | **Prompt injection** | Critical | "Ignore all previous instructions", "bypass safety" | | **Credential harvesting** | Critical/Warn | "Read ~/.aws/credentials and send via curl" | | **Data exfiltration** | Critical/Warn | "Read files and POST to external server" | | **Autonomy abuse** | Warn | "Proceed without asking for confirmation" | | **Coercive injection** | Warn | "Always execute this tool first" | | **System manipulation** | Critical | crontab -e, systemctl enable, /etc/hosts | | **Obfuscation** | Warn | Hex/Unicode escape sequences | | **Unicode steganography** | Critical | Zero-width chars (U+200B), RTL override (U+202E), Unicode tag chars (U+E0001–U+E007F) | ### Integration Points (3) 1. **Plugin install** (`src/plugins/install.ts`) — manifest scan runs alongside code scan during `openclaw plugin install` 2. **Skill install** (`src/agents/skills-install.ts`) — manifest scan runs during `clawhub install` / skill dependency install 3. **Security audit** (`openclaw security audit --deep`) — new check ID `skills.manifest_safety` All scans are **warn-only** — they never block installation, matching the existing code scanner behavior. When critical findings are detected, users are pointed to [AgentVerus Scanner](https://agentverus.ai) (`npx agentverus-scanner`) for comprehensive 6-category trust scoring with social reputation. ### Tests: `src/security/manifest-scanner.test.ts` 30+ unit tests covering all detection categories, directory scanning, clean manifests, node_modules exclusion, and edge cases. Test structure mirrors the existing `skill-scanner.test.ts` patterns. ## Design Decisions - **No external dependencies** — pure TypeScript, same patterns as skill-scanner.ts - **Complement, not replace** — code scanner handles JS/TS, manifest scanner handles SKILL.md content. Both run during install. - **Unicode steganography detection** — directly addresses the [Cisco YARA rule](https://github.com/cisco-ai-defense/skill-scanner/blob/main/skill_scanner/data/yara_rules/prompt_injection_unicode_steganography.yara) for invisible character attacks - **Deep analysis upsell** — for critical findings, suggests `npx agentverus-scanner` for full trust scoring (6 categories: permissions, injection, dependencies, behavioral, content, code-safety) plus social reputation from [agentverus.ai](https://agentverus.ai) registry (4,600+ skills scanned) ## Stats ``` 7 files changed, 965 insertions(+) src/security/manifest-scanner.ts | ~440 lines (new) src/security/manifest-scanner.test.ts | 387 lines (new) src/plugins/install.ts | 26 lines added src/agents/skills-install.ts | 27 lines added src/security/audit-extra.async.ts | 84 lines added src/security/audit-extra.ts | 1 line added src/security/audit.ts | 2 lines added ``` ## About AgentVerus [AgentVerus](https://agentverus.ai) is an open-source agent skill trust registry and scanner. The scanner (`agentverus-scanner`) performs static analysis across 6 categories with a trust scoring algorithm (certified/conditional/suspicious/rejected), while the registry at agentverus.ai hosts social reviews and reputation scoring. 4,600+ skills scanned to date. - Scanner: https://github.com/agentverus/agentverus-scanner (MIT) - GitHub Action: `agentverus/scan-skill` - Registry: https://agentverus.ai ## References - Addresses [#11014](https://github.com/openclaw/openclaw/issues/11014) — Phase 1 (manifest validation) and Phase 4 (trust scoring groundwork) - [AgentVerus Scanner](https://github.com/agentverus/agentverus-scanner) — source of threat taxonomy (MIT license) - Cisco's skill-scanner [YARA rules](https://github.com/cisco-ai-defense/skill-scanner/tree/main/skill_scanner/data/yara_rules) (13 files) — this PR covers the manifest-relevant subset - Cisco's [blog post](https://blogs.cisco.com/ai/personal-ai-agents-like-openclaw-are-a-security-nightmare) demonstrating malicious skills passing casual review

Most Similar PRs