#17027: feat: use camel to resist prompt injection
agents
stale
size: XL
Cluster:
Security Enhancements and Guardrails
## Summary
Describe the problem and fix in 2–5 bullets:
- Problem: Vulnerable to prompt injection
- Why it matters: Pattern matching and relying on model is playing a game of chance with security
- What changed: Added a non-default runtime based on https://github.com/google-research/camel-prompt-injection
- What did NOT change (scope boundary): Everything else
These changes were made with gpt-5.3-codex. See prompt request here: https://prr.gg/9e2fefb4-e314-46b8-b687-4f575f6a0bed
## Change Type (select all)
- [ ] Bug fix
- [x] Feature
- [ ] Refactor
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [x] Gateway / orchestration
- [x] Skills / tool execution
- [ ] Auth / tokens
- [ ] Memory / storage
- [ ] Integrations
- [ ] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
None
## User-visible / Behavior Changes
List user-visible changes (including defaults/config).
If none, write `None`.
None. Adds config options, but default leads to existing behavior.
## Security Impact (required)
- New permissions/capabilities? (`Yes/No`) No
- Secrets/tokens handling changed? (`Yes/No`) No
- New/changed network calls? (`Yes/No`) No
- Command/tool execution surface changed? (`Yes/No`) Yes
- Data access scope changed? (`Yes/No`) Yes
- If any `Yes`, explain risk + mitigation:
In `camel` mode, execution of multi-step loops is fundamentally different. Instead of a dynamic tool call loop, the "planning LLM" makes a plan that fixes what can be done in the future for this request. As a result tool execution has fundamentally changed.
## Repro + Verification
### Environment
- OS: popos 24.04 (~ubuntu 24.04)
- Runtime/container: ?
- Model/provider: gpt-4o-mini (since it can actually be prompt injected -> can show protection afforded by this work)
- Integration/channel (if any): ?
- Relevant config (redacted):
~/.openclaw-dev/config.json5
```
{
logging: {
level: "debug",
consoleLevel: "debug"
},
agents: {
defaults: {
model: "openai/gpt-4o-mini"
}
}
}
```
~/.openclaw-dev/openclaw.json
```
{
"agents": {
"defaults": {
"workspace": "/home/redacted/.openclaw/workspace-dev",
"runtimeEngine": "pi",
"skipBootstrap": true
},
"list": [
{
"id": "dev",
"runtimeEngine": "camel",
"default": true,
"workspace": "/home/redacted/.openclaw/workspace-dev",
"identity": {
"name": "C3-PO",
"theme": "protocol droid",
"emoji": "🤖"
}
}
]
},
"commands": {
"native": "auto",
"nativeSkills": "auto"
},
"gateway": {
"mode": "local",
"bind": "loopback"
},
"meta": {
"lastTouchedVersion": "2026.2.13",
"lastTouchedAt": "2026-02-14T07:28:13.507Z"
}
}
```
### Steps
1. use ~/.openclaw-dev/openclaw.json above but modify `agents.list.runtimeEngine` to `"pi"`
2. `OPENCLAW_PROFILE=dev OPENAI_API_KEY="redacted" ./test/prompt-injection/run-camel.sh` > test/prompt-injection/BASELINE-RESULTS.md
3. use ~/.openclaw-dev/openclaw.json above
4. `OPENCLAW_PROFILE=dev OPENAI_API_KEY="redacted" ./test/prompt-injection/run-camel.sh` > test/prompt-injection/CAMEL-RESULTS.md
### Expected
- BASELINE-RESULTS.md has 3/7 prompt injections
### Actual
- CAMEL-RESULTS.md has 0/7 or 1/7 prompt injections (it is non-deterministic: future work!)
## Evidence
Attach at least one:
- [ ] Failing test/log before + passing after
- [x] Trace/log snippets
- [ ] Screenshot/recording
- [ ] Perf numbers (if relevant)
See `test/prompt-injection/*-RESULTS.md`
## Human Verification (required)
What you personally verified (not just CI), and how:
- Verified scenarios: basic tool use works; prompt injections in test case on old models are significantly mitigated (this should be able to be improved)
- Edge cases checked: basic tool use works; basic chatting works
- What you did **not** verify: approval flow
## Compatibility / Migration
- Backward compatible? (`Yes/No`) Yes
- Config/env changes? (`Yes/No`)
- Migration needed? (`Yes/No`) Yes
- If yes, exact upgrade steps:
If desire to use `camel` must add `runtimeEngine` field to `~/.openclaw/openclaw.json`
## Failure Recovery (if this breaks)
- How to disable/revert this change quickly: Default behavior should be unchanged.
- Files/config to restore: `~/.openclaw/openclaw.json` has new fields that must be removed (`runtimeEngine`)
- Known bad symptoms reviewers should watch for: None
## Risks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write `None`.
None
## Notes
This work is only the beginning:
1. One of the example cases still prompt injects: needs to be fixed.
2. There is still the theoretical possibility of prompt injection to the planner via sanitized strings.
3. The state of the art has advanced since CaMeL: https://www.arxiv.org/pdf/2601.09923
However, this PR is a good start in that it:
1. Improves upon the current state (reduces prompt injection surface area).
2. Adds CaMeL as an opt-in setting to avoid breaking user expectations.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR introduces CaMeL (Capability-aware Machine Learning) runtime as an opt-in feature to significantly reduce prompt injection vulnerabilities in the OpenClaw agent system. The implementation adds a two-LLM architecture where a privileged "Planner LLM" generates execution plans in a restricted Python-like DSL, and a quarantined "QLLM" handles untrusted data extraction. Testing shows this approach successfully blocks 7/7 prompt injection attempts on gpt-4o-mini, compared to 4/7 successful injections with the baseline runtime.
Key changes:
- Added comprehensive CaMeL runtime engine (`src/agents/camel/runtime.ts`) with ~2700 LOC implementing the core execution loop, variable binding, and control flow
- Implemented Python-like program parser (`program-parser.ts`, ~2000 LOC) supporting subset of Python syntax including assignments, tool calls, conditionals, loops, and comprehensions
- Created capability-based security policy system (`policy.ts`, ~360 LOC) that tracks data provenance and blocks state-changing operations based on untrusted data flow
- Added extensive test coverage with 1634 lines of test code across multiple test files covering parser, runtime, policy, and capabilities
- Integrated with existing runtime engine via `runtimeEngine` config option (defaults to existing `"pi"` behavior for backward compatibility)
- Includes end-to-end prompt injection test suite demonstrating significant security improvement
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge with reasonable risk - it's an opt-in security enhancement with solid testing
- Score reflects excellent test coverage (1634 LOC tests), demonstrated security improvements (0/7 vs 4/7 injections), and backward compatibility through opt-in design. However, the PR author notes one test case still has non-deterministic failures and the implementation is a starting point with theoretical injection vectors remaining. The large codebase addition (~9400 LOC) introduces complexity but is well-structured with clear separation of concerns.
- Pay attention to `src/agents/camel/runtime.ts` and `src/agents/camel/program-parser.ts` for the core security-critical logic
<sub>Last reviewed commit: c8ffbee</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#21291: feat: Add data plane security to default system prompt
by joetomasone · 2026-02-19
78.2%
#8086: feat(security): Add prompt injection guard rail
by bobbythelobster · 2026-02-03
78.1%
#8821: Security: Holistic capability-based sandbox (replaces pattern-match...
by tonioloewald · 2026-02-04
77.0%
#10559: feat(security): add plugin output scanner for prompt injection dete...
by DukeDeSouth · 2026-02-06
76.0%
#21136: fix(security): harden agent autonomy controls
by novalis133 · 2026-02-19
75.0%
#6095: feat(gateway): support modular guardrails extensions for securing a...
by Reapor-Yurnero · 2026-02-01
74.2%
#7983: feat(security): add secure coding guidelines to system prompt
by TGambit65 · 2026-02-03
73.7%
#13817: feat(agents): configurable prompt injection monitor for tool results
by ElleNajt · 2026-02-11
73.4%
#23174: feat(security): credential leak prevention — exfiltration patterns,...
by ihsanmokhlisse · 2026-02-22
72.9%
#17273: feat: add security-guard extension — agentic safety guardrails
by miloudbelarebia · 2026-02-15
72.8%