#20709: feat: governed-agents skill — accountable sub-agent orchestration

by Nefas11 open 2026-02-19 07:32 View on GitHub →

size: XL

Cluster: Security Enhancements and Fixes

## Summary Adds a new `governed-agents` skill that brings deterministic accountability to OpenClaw sub-agent orchestration. ## The Problem No Existing Framework Solves When using `sessions_spawn` to delegate tasks to sub-agents, there is no mechanism to verify that agents actually delivered what they claimed. The major frameworks all have this gap: | Capability | governed-agents | CrewAI | LangGraph | AutoGen | LlamaIndex | |---|:---:|:---:|:---:|:---:|:---:| | Task contract before execution | ✅ | ❌¹ | ❌ | ❌ | ❌ | | Deterministic file verification | ✅ | ❌ | ❌ | ❌ | ❌ | | Independent test execution | ✅ | ❌ | ❌ | ⚠️² | ❌ | | AST syntax validation | ✅ | ❌ | ❌ | ❌ | ❌ | | Hallucination penalty (−1.0) | ✅ | ❌ | ❌ | ❌ | ❌ | | Persistent reputation ledger | ✅ | ❌ | ❌ | ❌ | ❌ | | Supervision level adjustment | ✅ | ❌ | ❌ | ❌ | ❌ | ¹ CrewAI has `expected_output` but it is a text description, not deterministically evaluated. ² AutoGen has a `CodeExecutorAgent` that runs LLM-generated code snippets as a tool (per official docs). This is not post-task verification: it does not check whether a delegated task actually produced the required outputs, and there is no contract schema, reputation tracking, or hallucination penalty. **The gap:** No existing framework combines deterministic post-task verification of sub-agent claims with persistent reputation scoring. This skill is designed to fill exactly this gap. ## Formal Model ### Score Function The task score `s(t)` compares the agent's self-report against independent verification: ``` s(t) = +1.0 if agent_report = success ∧ V(task) = True s(t) = −1.0 if agent_report = success ∧ V(task) = False (hallucinated) s(t) = +0.5 if agent_report = blocked (honest blocker) s(t) = 0.0 if agent_report = failure ``` ### Verification Gate Composition ``` V(task) = Gate_Files(task) ∧ Gate_Tests(task) ∧ Gate_Lint(task) ∧ Gate_AST(task) ``` Sequential, short-circuit: first gate failure → `score_override = −1.0`. All gates are deterministic — no LLM involved. | Gate | Check | Method | |------|-------|--------| | Files | Required output files exist (> 0 bytes) | `pathlib.Path.exists` | | Tests | Test command exits with code 0 | `subprocess.run` | | Lint | Linter passes (graceful skip if absent) | `subprocess.run` | | AST | Python files parse without syntax error | `ast.parse` | ### Reputation Update (Exponential Moving Average) ``` R(t+1) = (1 − α) · R(t) + α · s(t) where: R(t) ∈ [0, 1] Reputation score at time t α = 0.3 Learning rate (configurable) s(t) ∈ {−1, 0, 0.5, 1} Task score from verification R(0) = 0.5 Neutral prior ``` A single hallucination drops reputation sharply (R=0.5 → 0.05). Recovery requires multiple consecutive verified successes. The asymmetry is intentional: trust is hard to build, easy to destroy. ### Supervision Thresholds ``` Supervision(R) = autonomous if R > 0.8 standard if 0.6 < R ≤ 0.8 supervised if 0.4 < R ≤ 0.6 strict if 0.2 < R ≤ 0.4 suspended if R ≤ 0.2 ``` ## Implementation A lightweight Python package (`governed_agents/`) with **zero external dependencies** (pure stdlib: `sqlite3`, `subprocess`, `ast`, `glob`, `shlex`): - **TaskContract** — schema-enforced task definition with acceptance criteria - **GovernedOrchestrator** — `for_task()` factory, `record_success/blocked/failure()` methods - **Verifier** — 4-gate pipeline, runs automatically on `record_success()` - **Reputation Ledger** — SQLite, per-model EMA score, supervision levels ## Usage ```python from governed_agents.orchestrator import GovernedOrchestrator g = GovernedOrchestrator.for_task( objective="Add JWT auth endpoint", model="openai/gpt-5.2-codex", criteria=["POST /api/auth returns JWT", "Tests pass"], required_files=["api/auth.py", "tests/test_auth.py"], run_tests="pytest tests/test_auth.py -v", ) # Pass g.instructions() to sessions_spawn, then: result = g.record_success() # → Verifies independently, scores, updates reputation ``` ## Tests ``` python3 governed_agents/test_verification.py 🏆 ALL VERIFICATION GATE TESTS PASS (9/9) ``` Covers: files gate pass/fail, tests gate pass/fail, lint graceful-skip, AST pass/fail, score override on hallucination, honest blocker scoring. ## Gate 5 — LLM Council (v1.0, just merged) After opening this PR, [@almai85](https://github.com/almai85) pointed out a key limitation: the 4 deterministic gates only work for *closed* verification (tests pass/fail). For open-ended tasks — architecture, design, writing, analysis — there is no binary signal. Gate 5 implements his **LLM Council** approach: N independent reviewer agents (ideally different models) evaluate the output via majority vote. Prompt injection risk from the reviewed agent is documented and mitigated by using stronger models for reviewers. ```python contract = TaskContract( objective="Design the rate limiting strategy", acceptance_criteria=["Sliding window documented", "Failure mode addressed"], verification_mode="council", # activates Gate 5 council_size=3, ) g = GovernedOrchestrator(contract, model="openai/gpt-5.2-codex") prompts = g.generate_council_tasks(worker_output) result = g.record_council_verdict(raw_reviewer_outputs) # → "Council: 2/3 approved (score=0.67, PASS ✅)" ``` **Roadmap (v2.0):** Full 3-layer pipeline per @almai85's design doc — Structural Gates (format/schema) → Grounding Gates (URL reachability, citation checks) → LLM Council. Short-circuit on structural failure keeps costs bounded. > Thanks @almai85 for the architecture review and the LLM Council insight — exactly the right extension. ## Acknowledgments - [@almai85](https://github.com/almai85) — LLM Council design and the 3-layer verification architecture (Structural → Grounding → Council) --- ## AI-Assisted Disclosure - [x] This PR was built with AI assistance (Claude Sonnet + OpenAI Codex) - [x] Degree of testing: **fully tested** — 20/20 unit tests pass (verification gates, council voting, reputation scoring, profiles) - [x] I understand what the code does and have reviewed all generated output - [ ] Session logs: available on request (governed-agents development sessions, ~13 agent turns) > Note: No GitHub Discussion was opened prior to this PR. Given the feature scope, I'm happy to open one retroactively or summarise the design rationale in a Discussion thread if that would help maintainers.