#17273: feat: add security-guard extension — agentic safety guardrails

by miloudbelarebia open 2026-02-15 16:03 View on GitHub →

size: L

Cluster: Security Enhancements and Guardrails

## Summary - **New extension**: `extensions/security-guard/` — real-time security guardrails for OpenClaw - Hooks into the plugin lifecycle to detect and block prompt injection attacks, audit configuration security, and provide live threat monitoring - Addresses **#17255** (Implementing Agentic Safety Guardrails) ### What it does | Hook | Action | |------|--------| | `message_received` | Scans inbound messages against 50+ injection patterns (9 categories) | | `before_tool_call` | Inspects tool parameters for injection; blocks critical threats | | `gateway_start` | Runs a full config security audit on startup | | `registerCommand` | Adds `/security-status` for live threat summary | | `registerService` | Background periodic scanner with configurable interval | ### Threat categories detected 1. Instruction Override (critical) 2. Role Hijacking (high) 3. Data Exfiltration (critical) 4. Privilege Escalation (critical) 5. System Prompt Extraction (medium) 6. Jailbreak Attempts (high) 7. Delimiter Manipulation (critical) 8. Encoding Bypass (medium) 9. Tool Manipulation (high) ### Config audit checks - Gateway binding (loopback vs public) - Gateway auth token - Sandbox mode enforcement - Channel DM policies (WhatsApp, Telegram, Discord) - Elevated mode status - Rate limiting ### Configuration ```json { "plugins": { "security-guard": { "sensitivity": "medium", "blockOnCritical": true, "scanIntervalMinutes": 5, "auditOnGatewayStart": true } } } ``` ## Background This extension is ported from [openclaw-security-guard](https://github.com/miloudbelarebia/openclaw-security-guard), a standalone CLI security tool I built for the OpenClaw ecosystem. The standalone tool includes additional features (live dashboard, secrets scanning, dependency auditing, auto-hardening) — this extension brings the core runtime guardrails directly into OpenClaw's plugin system. ## Test plan - [ ] Verify extension loads without errors via `pnpm build` - [ ] Test `message_received` hook with known injection patterns - [ ] Test `before_tool_call` hook blocks when injection is in params - [ ] Test `/security-status` command returns correct output - [ ] Verify config audit produces accurate findings - [ ] Test background scanner starts and runs on interval ## Local Validation - `pnpm build`: ✅ extension compiles without errors - `pnpm check` (tsgo): ✅ passes - Formatting: verified with `oxfmt --check` ## Scope New extension at `extensions/security-guard/` — 6 files, self-contained, no modifications to core codebase. ## AI Assistance AI-assisted (Claude Code) for codebase exploration, pattern discovery, and drafting. The architecture decisions, threat pattern design, and hook integration are my own work. Testing level: Locally validated — confirmed build passes and extension loads correctly. ## Author **Miloud Belarebia** — [2pidata.com](https://2pidata.com) — [@miloudbelarebia](https://github.com/miloudbelarebia)  <h3>Greptile Summary</h3> This PR adds a new security-guard extension that provides runtime security guardrails for OpenClaw through prompt injection detection, configuration auditing, and threat monitoring. **Key Changes:** - Hooks into `message_received` to scan for 50+ injection patterns across 9 threat categories (instruction override, role hijacking, data exfiltration, etc.) - Uses `before_agent_start` hook to inject security warnings when threats are detected (clever workaround since `message_received` cannot block) - Implements `before_tool_call` to block tool invocations with injected parameters - Runs config security audit on gateway startup checking bind mode, auth tokens, sandbox settings, DM policies, and rate limiting - Provides `/security-status` command and background periodic scanner **Architecture:** The extension is well-organized into separate modules (`injection-patterns.ts`, `config-auditor.ts`) with clear separation of concerns. The approach of using `before_agent_start` to inject warnings after `message_received` detects threats is a reasonable workaround for the hook limitation. **Previous issues identified:** - Config schema mismatch in `index.ts` (using inline schema vs the pattern in `openclaw.plugin.json`) - Invalid config path check (`config.security.rateLimiting` doesn't exist, should be `config.gateway.auth.rateLimit`) - Timer storage approach fixed (now uses module-level Map instead of casting `ctx`) - `message_received` cannot actually block delivery (addressed via `before_agent_start` warning injection) <h3>Confidence Score: 3/5</h3> - This PR has several issues that should be addressed before merging, primarily around config validation and schema consistency. - The extension provides valuable security functionality, but the config auditor checks non-existent paths which will cause incorrect audit results. The inline `configSchema` in `index.ts` duplicates the schema in `openclaw.plugin.json` which could lead to drift. The `message_received` hook limitation is correctly addressed but the description could be clearer. - `src/config-auditor.ts` needs attention for the invalid config path check at line 119 <sub>Last reviewed commit: 6c4ec5f</sub>