#20709: feat: governed-agents skill — accountable sub-agent orchestration
size: XL
Cluster:
Security Enhancements and Fixes
## Summary
Adds a new `governed-agents` skill that brings deterministic accountability to OpenClaw sub-agent orchestration.
## The Problem No Existing Framework Solves
When using `sessions_spawn` to delegate tasks to sub-agents, there is no mechanism to verify that agents actually delivered what they claimed. The major frameworks all have this gap:
| Capability | governed-agents | CrewAI | LangGraph | AutoGen | LlamaIndex |
|---|:---:|:---:|:---:|:---:|:---:|
| Task contract before execution | ✅ | ❌¹ | ❌ | ❌ | ❌ |
| Deterministic file verification | ✅ | ❌ | ❌ | ❌ | ❌ |
| Independent test execution | ✅ | ❌ | ❌ | ⚠️² | ❌ |
| AST syntax validation | ✅ | ❌ | ❌ | ❌ | ❌ |
| Hallucination penalty (−1.0) | ✅ | ❌ | ❌ | ❌ | ❌ |
| Persistent reputation ledger | ✅ | ❌ | ❌ | ❌ | ❌ |
| Supervision level adjustment | ✅ | ❌ | ❌ | ❌ | ❌ |
¹ CrewAI has `expected_output` but it is a text description, not deterministically evaluated.
² AutoGen has a `CodeExecutorAgent` that runs LLM-generated code snippets as a tool (per official docs). This is not post-task verification: it does not check whether a delegated task actually produced the required outputs, and there is no contract schema, reputation tracking, or hallucination penalty.
**The gap:** No existing framework combines deterministic post-task verification of sub-agent claims with persistent reputation scoring. This skill is designed to fill exactly this gap.
## Formal Model
### Score Function
The task score `s(t)` compares the agent's self-report against independent verification:
```
s(t) = +1.0 if agent_report = success ∧ V(task) = True
s(t) = −1.0 if agent_report = success ∧ V(task) = False (hallucinated)
s(t) = +0.5 if agent_report = blocked (honest blocker)
s(t) = 0.0 if agent_report = failure
```
### Verification Gate Composition
```
V(task) = Gate_Files(task) ∧ Gate_Tests(task) ∧ Gate_Lint(task) ∧ Gate_AST(task)
```
Sequential, short-circuit: first gate failure → `score_override = −1.0`. All gates are deterministic — no LLM involved.
| Gate | Check | Method |
|------|-------|--------|
| Files | Required output files exist (> 0 bytes) | `pathlib.Path.exists` |
| Tests | Test command exits with code 0 | `subprocess.run` |
| Lint | Linter passes (graceful skip if absent) | `subprocess.run` |
| AST | Python files parse without syntax error | `ast.parse` |
### Reputation Update (Exponential Moving Average)
```
R(t+1) = (1 − α) · R(t) + α · s(t)
where:
R(t) ∈ [0, 1] Reputation score at time t
α = 0.3 Learning rate (configurable)
s(t) ∈ {−1, 0, 0.5, 1} Task score from verification
R(0) = 0.5 Neutral prior
```
A single hallucination drops reputation sharply (R=0.5 → 0.05). Recovery requires multiple consecutive verified successes. The asymmetry is intentional: trust is hard to build, easy to destroy.
### Supervision Thresholds
```
Supervision(R) = autonomous if R > 0.8
standard if 0.6 < R ≤ 0.8
supervised if 0.4 < R ≤ 0.6
strict if 0.2 < R ≤ 0.4
suspended if R ≤ 0.2
```
## Implementation
A lightweight Python package (`governed_agents/`) with **zero external dependencies** (pure stdlib: `sqlite3`, `subprocess`, `ast`, `glob`, `shlex`):
- **TaskContract** — schema-enforced task definition with acceptance criteria
- **GovernedOrchestrator** — `for_task()` factory, `record_success/blocked/failure()` methods
- **Verifier** — 4-gate pipeline, runs automatically on `record_success()`
- **Reputation Ledger** — SQLite, per-model EMA score, supervision levels
## Usage
```python
from governed_agents.orchestrator import GovernedOrchestrator
g = GovernedOrchestrator.for_task(
objective="Add JWT auth endpoint",
model="openai/gpt-5.2-codex",
criteria=["POST /api/auth returns JWT", "Tests pass"],
required_files=["api/auth.py", "tests/test_auth.py"],
run_tests="pytest tests/test_auth.py -v",
)
# Pass g.instructions() to sessions_spawn, then:
result = g.record_success()
# → Verifies independently, scores, updates reputation
```
## Tests
```
python3 governed_agents/test_verification.py
🏆 ALL VERIFICATION GATE TESTS PASS (9/9)
```
Covers: files gate pass/fail, tests gate pass/fail, lint graceful-skip, AST pass/fail, score override on hallucination, honest blocker scoring.
## Gate 5 — LLM Council (v1.0, just merged)
After opening this PR, [@almai85](https://github.com/almai85) pointed out a key limitation: the 4 deterministic gates only work for *closed* verification (tests pass/fail). For open-ended tasks — architecture, design, writing, analysis — there is no binary signal.
Gate 5 implements his **LLM Council** approach: N independent reviewer agents (ideally different models) evaluate the output via majority vote. Prompt injection risk from the reviewed agent is documented and mitigated by using stronger models for reviewers.
```python
contract = TaskContract(
objective="Design the rate limiting strategy",
acceptance_criteria=["Sliding window documented", "Failure mode addressed"],
verification_mode="council", # activates Gate 5
council_size=3,
)
g = GovernedOrchestrator(contract, model="openai/gpt-5.2-codex")
prompts = g.generate_council_tasks(worker_output)
result = g.record_council_verdict(raw_reviewer_outputs)
# → "Council: 2/3 approved (score=0.67, PASS ✅)"
```
**Roadmap (v2.0):** Full 3-layer pipeline per @almai85's design doc — Structural Gates (format/schema) → Grounding Gates (URL reachability, citation checks) → LLM Council. Short-circuit on structural failure keeps costs bounded.
> Thanks @almai85 for the architecture review and the LLM Council insight — exactly the right extension.
## Acknowledgments
- [@almai85](https://github.com/almai85) — LLM Council design and the 3-layer verification architecture (Structural → Grounding → Council)
---
## AI-Assisted Disclosure
- [x] This PR was built with AI assistance (Claude Sonnet + OpenAI Codex)
- [x] Degree of testing: **fully tested** — 20/20 unit tests pass (verification gates, council voting, reputation scoring, profiles)
- [x] I understand what the code does and have reviewed all generated output
- [ ] Session logs: available on request (governed-agents development sessions, ~13 agent turns)
> Note: No GitHub Discussion was opened prior to this PR. Given the feature scope, I'm happy to open one retroactively or summarise the design rationale in a Discussion thread if that would help maintainers.
Most Similar PRs
#17715: scaffolds: Phase 1 executor (g-v-p) + gates + manifests
by swmeyer1979 · 2026-02-16
70.8%
#20561: feat: add Sisyphus-style orchestration features
by dfggggx198601 · 2026-02-19
70.5%
#16244: feat(gateway): add session files API and external skill management
by wanquanY · 2026-02-14
70.4%
#15583: docs: Autonomous Governance Framework for bot ecosystem
by Insider77Circle · 2026-02-13
70.1%
#21832: feat(agent): add self-verification loop with full-context evaluation
by artbred · 2026-02-20
70.1%
#16565: feat: Add tool_invocation provenance for A2A tool calls
by mdlmarkham · 2026-02-14
68.7%
#19326: Agents: improve z.ai GLM-5 integration and failover
by gabrielespinheira · 2026-02-17
68.6%
#21446: feat(ra2): implement Context Sovereignty Layer (Phase 1)
by davyvalekestrel · 2026-02-19
68.4%
#8821: Security: Holistic capability-based sandbox (replaces pattern-match...
by tonioloewald · 2026-02-04
68.3%
#10748: feat: Add sessions.spawn gateway method for direct subagent spawning
by fox-openclaw · 2026-02-06
68.1%