← Back to PRs

#8197: [AI-Assisted] feat: Add "Hardball" Security Framework (MFA-protected Agent Integrity)

by rodgui open 2026-02-03 18:16 View on GitHub →
docs stale
## 🛡️ Proposal: The "Hardball" Framework for Agent Integrity This PR introduces a comprehensive security architecture called **"Hardball"** to the OpenClaw documentation. ### Why it matters Autonomous agents are vulnerable to prompt injection and identity hijacking. While model-level resistance is important, system-level governance is crucial. The Hardball Framework implements **Human-in-the-loop MFA** for critical system modifications. ### Key Features: - **Dual-Layer Defense:** Separates security "Instincts" (`SOUL.md`) from operational "Playbooks" (`SECURITY.md`). - **Hardened MFA Standards:** Defines ephemeral, RAM-only OTP protocols to prevent persistent credential leaks. - **Camouflage Policy:** Best practices for brief, non-descriptive refusals to mitigate logic exfiltration. - **Native Integration:** Shows how to implement this using OpenClaw's existing environment variables and Markdown-based persona architecture. This framework has been implemented and tested in a production-like environment, demonstrating how OpenClaw can reach enterprise-grade safety without core code modifications. --- **AI-Assisted:** This proposal was developed collaboratively with an OpenClaw agent. **Testing:** Fully tested implementation using Gmail SMTP for MFA delivery and strict SOUL/SECURITY playbooks. <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> Adds documentation for a proposed “Hardball” agent-integrity framework and links it from `SECURITY.md`. - `SECURITY.md` gains a new Operational Guidance link pointing to the Hardball doc. - `docs/concepts/hardball-security.md` introduces a two-tier “SOUL.md + SECURITY.md” governance model and an MFA/OTP flow example (Gmail SMTP) for “vital file” changes. Overall this fits into the existing security docs, but a few doc-style and accuracy issues (internal link format, branding consistency, and questionable CVE references) should be addressed so the guidance is reliable and renders correctly on Mintlify. <h3>Confidence Score: 3/5</h3> - Safe to merge from a runtime perspective, but documentation correctness/style issues should be fixed first. - Changes are documentation-only, so they won’t affect runtime behavior; however, the new guidance includes style-guide violations (Mintlify link format, branding) and potentially incorrect CVE references that could mislead users. - SECURITY.md, docs/concepts/hardball-security.md <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> **Context used:** - Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8)) - Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13)) <!-- /greptile_comment -->

Most Similar PRs