#8821: Security: Holistic capability-based sandbox (replaces pattern-matching whack-a-mole)
docs
stale
Cluster:
Security Enhancements and Fixes
> **Disclaimer**: This is a proof-of-concept contribution from outside the core team. We are not OpenClaw/Clawbot experts and have not tested this integration against a running Clawbot instance. The code demonstrates that capability-based security _can_ work with OpenClaw architecture, but will likely need adaptation and review from maintainers who understand the codebase deeply. We are showing this works in principle, not shipping production-ready code.
## Summary
This PR adds a defense-in-depth security module for OpenClaw powered by [ajs-clawbot](https://www.npmjs.com/package/ajs-clawbot), providing **Runtime-Layer Permission** security that makes dangerous operations impossible rather than merely discouraged.
## Performance
The sandbox overhead is negligible:
| Metric | Value |
| ---------------------------------- | ------- |
| **Sandbox overhead per execution** | 0.174ms |
| As % of typical API call (100ms) | 0.17% |
| As % of typical LLM call (1000ms) | 0.017% |
See [ajs-clawbot BENCHMARK.md](https://github.com/tonioloewald/ajs-clawbot/blob/main/BENCHMARK.md) for methodology.
## The Problem
When you expose your OpenClaw bot to external users (Discord servers, Telegram groups, etc.), they can craft messages that exploit prompt injection to:
- Read sensitive files (.env, SSH keys, credentials)
- Execute arbitrary commands
- Exfiltrate data via network requests
- Cause denial of service through flooding or infinite loops
Current "fixes" (regex filters, prompt engineering) use **Application-Layer Permission** - the capability exists and a boolean decides whether to use it. This is trivially bypassed via prompt injection.
## The Solution: Runtime-Layer Permission
This module uses **ajs-clawbot capability-based security** where dangerous capabilities literally do not exist until explicitly granted. There is nothing to bypass.
```
APPLICATION-LAYER (Current) RUNTIME-LAYER (This PR)
=========================== =======================
+------------------+ +------------------+
| if (allowed) { | <-- bypass! | fs.read()? |
| fs.read() | +--------+---------+
| } | |
+--------+---------+ v
| +------------------+
v | CAPABILITY NOT |
+------------------+ | BOUND TO VM |
| fs.read() runs | | |
| (always exists) | | Function doesn't |
+------------------+ | exist to call! |
+------------------+
```
## What This PR Does (and Does Not Do)
### What it does:
- Adds integration layer mapping OpenClaw message sources to trust levels
- Provides rate limiting and flood protection infrastructure
- Demonstrates the capability-based security model
- Passes 24 integration tests
### What it does not do:
- Replace existing OpenClaw skill execution (this is additive, not a replacement)
- Route skills through the AJS VM (that would require converting skills to AJS)
- Guarantee production readiness (needs testing by maintainers who know the codebase)
This is a "foot in the door" - showing the architecture works so the team can evaluate whether to adopt it.
## Features
### 1. Zero Capabilities by Default
Skills start with nothing. They cannot read files, fetch URLs, or execute commands unless explicitly granted.
### 2. Trust Levels by Message Source
- CLI user -> full trust
- Owner flag -> full trust
- Trusted users -> shell trust
- DMs -> write trust
- Group chats -> llm trust
- Public channels -> network trust
### 3. Always-Blocked Patterns
Sensitive files blocked regardless of trust level:
- Environment: .env, .env.\*
- SSH: id_rsa, id_ed25519, .ssh/\*
- Credentials: credentials._, secrets._
- Certificates: _.pem, _.key
- Cloud: .aws/_, .gcloud/_, .kube/\*
### 4. SSRF Protection
- Private IPs: 10.x, 192.168.x, 127.x, etc.
- Cloud metadata: 169.254.169.254
- Blocked hostnames: localhost, \*.local, metadata.google.internal
### 5. Rate Limiting and Flood Protection
- Self-message rejection (prevents recursion attacks)
- Per-requester and global rate limits
- Automatic cooldown
### 6. Capability-Gated Shell
Maintains parity with OpenClaw's robust process tree killing, but wraps the shell in a strict **Allowlist Capability**. Commands are validated against a policy _before_ the process is spawned, preventing unauthorized execution even if the prompt injection succeeds.
## Files Changed
- src/safe-executor/index.ts - Module exports
- src/safe-executor/openclaw-executor.ts - OpenClaw-specific integration
- src/safe-executor/config.ts - Configuration loading
- src/safe-executor/safe-executor.test.ts - 24 integration tests
## Dependencies
- ajs-clawbot@^0.2.7 - Runtime-layer capability-based security
## Testing
24 integration tests covering trust levels, security utilities, and process utilities.
The underlying ajs-clawbot package has 254 tests.
## Backwards Compatibility
This module is opt-in and does not change existing behavior.
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR adds a new `src/safe-executor/` module that exposes `ajs-clawbot`’s capability-based “safe execution” primitives and an OpenClaw-specific bridge (`createOpenClawExecutor`) that maps message sources (CLI/owner/trusted/DM/group/public) into `ajs-clawbot` trust levels and builds an `ExecutionContext` for running skills under a `SafeExecutor`. It also adds a small JSON config loader (`~/.openclaw/safe-executor.json`), integration tests for the mapping + ajs-clawbot utilities, and wires in the `ajs-clawbot@^0.2.7` dependency.
Main issues to address before this feels consistent with the PR’s “opt-in / configurable” framing: the integration currently ignores `config.enabled`, does not apply `config.workdir` or `config.rateLimiting`, and defines several config fields (e.g., `trustLevelOverrides`, `allowedCommands`, `additionalBlockedPatterns`) that are unused in the integration layer.
<h3>Confidence Score: 3/5</h3>
- Reasonably safe to merge, but several config/opt-in controls appear non-functional in the current integration.
- The added code is isolated and mostly additive, but multiple config fields are currently dead/ignored (including `enabled`, `workdir`, and `rateLimiting`), which could surprise deployers and undermine the advertised opt-in/configurable behavior.
- src/safe-executor/openclaw-executor.ts and src/safe-executor/config.ts
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#8086: feat(security): Add prompt injection guard rail
by bobbythelobster · 2026-02-03
82.2%
#10514: Security: harden AGENTS.md with gateway, prompt injection, and supp...
by catpilothq · 2026-02-06
80.3%
#17273: feat: add security-guard extension — agentic safety guardrails
by miloudbelarebia · 2026-02-15
79.4%
#7983: feat(security): add secure coding guidelines to system prompt
by TGambit65 · 2026-02-03
78.9%
#21308: feat(skills): add ClawTrust — reputation engine & gig marketplace f...
by clawtrustmolts · 2026-02-19
78.7%
#15757: feat(security): add hardening gap audit checks
by saurabhsh5 · 2026-02-13
78.6%
#6095: feat(gateway): support modular guardrails extensions for securing a...
by Reapor-Yurnero · 2026-02-01
78.4%
#7346: Security: add hardening module and secure-bot extension
by AlphonseC · 2026-02-02
78.1%
#8137: feat: openclaw-env hardened sandbox generator (MVP)
by krahimov · 2026-02-03
78.0%
#23574: security: P0 critical remediation — plugin sandbox, password hashin...
by lumeleopard001 · 2026-02-22
77.9%