#21832: feat(agent): add self-verification loop with full-context evaluation
size: XL
Cluster:
Security Enhancements and Fixes
## Summary
Adds an opt-in **self-verification loop** to the agent runner. After an agent completes a task, a separate verifier model evaluates whether the response actually addresses the user's request — using the **full conversation context**, user rules (AGENTS.md, SOUL.md), conversation history, and execution metadata. If verification fails, structured feedback is injected back into the conversation and the agent retries, up to a configurable max attempts.
This is an **"LLM-as-a-judge"** pattern integrated directly into `runReplyAgent()`, designed to improve response quality without manual re-prompting.
## How it works
### Core loop
1. Agent produces a response via the normal `runAgentTurnWithFallback()` flow
2. **Deterministic pre-checks** (`shouldSkipVerification()`) decide if verification applies — skips tool-call turns, already-streamed content, messaging tool sends, and empty responses
3. **Keyword trigger**, `verifyAll` mode, or `verifyHeartbeat` mode decides whether to invoke the verifier
4. `assembleVerifierContext()` builds a token-budgeted evaluation prompt (~32K char budget) with all available context
5. `verifyAgentResponse()` makes a standalone LLM call to a verifier model
6. The verifier uses **chain-of-thought** reasoning across 3 dimensions (Goal Achievement, Completeness, Rule Compliance)
7. Returns `PASS` or `FAIL [category]: <feedback>` with a structured fail category
8. On failure, categorized feedback is injected as a new user message and the agent retries
9. Loop continues until PASS or max attempts exhausted
10. **Fail-open**: if the verifier errors, times out, or returns malformed output, the original response is delivered — verification never blocks delivery
### Deterministic pre-checks (no LLM call needed)
These conditions skip verification entirely before any LLM call:
- `stopReason === "tool_calls"` or `pendingToolCalls` — intermediate turn, not final
- `didSendViaMessagingTool` — content already delivered (WhatsApp, Telegram, etc.)
- Block streaming already sent (`directlySentBlockKeys.size > 0`) — can't un-send
- Empty response text
### Full-context evaluation
The verifier assembles context from multiple sources, ordered to mitigate the **"Lost in the Middle"** effect (critical context at START and END):
```
[SYSTEM PROMPT] ← evaluation instructions + 3-dimension rubric
[USER MESSAGE] ← what was asked (near start, never trimmed)
[USER RULES] ← AGENTS.md, SOUL.md excerpts (capped ~8K chars)
[CONVERSATION HISTORY] ← last 5 turns from InboundHistory (zero I/O)
[EXECUTION METADATA] ← stop_reason, duration
[PREVIOUS FEEDBACK] ← prior verification feedback (on retries)
[AGENT RESPONSE] ← what we're evaluating (at END for recency)
```
Token budget priority (trim from bottom up):
1. System prompt (~1K chars) — fixed
2. User message — never trimmed
3. Agent response — truncate tail if >12K chars
4. Execution metadata — tiny, always include
5. User rules — cap at ~8K chars
6. Conversation history — trim oldest first
### Heartbeat verification
When `verifyHeartbeat: true`, the verifier also evaluates **heartbeat responses**. This catches lazy `HEARTBEAT_OK` replies when the agent should have taken proactive action based on `HEARTBEAT.md` tasks.
Without `verifyHeartbeat`, heartbeat runs are always skipped (default behavior — no overhead on periodic checks that don't need verification).
With `verifyHeartbeat: true`, **all** heartbeat responses are verified regardless of keyword triggers — this is intentional because the whole point is catching agents that respond `HEARTBEAT_OK` without doing real work.
Example: If `HEARTBEAT.md` says "Check for new Kaggle competitions and participate" but the agent responds `HEARTBEAT_OK` without checking, the verifier flags it as `goal_missed` and the agent retries with proper task execution.
### Structured failure categories
On failure, the verifier outputs a categorized response for smarter retries:
- `goal_missed` — response doesn't address the actual question
- `incomplete` — response is truncated or missing requested elements
- `rule_violation` — response violates user rules (AGENTS.md, SOUL.md)
- `tone_mismatch` — wrong tone, register, or communication style
- `refusal` — agent refused a legitimate request
The category is logged via `emitAgentEvent()` and included in the retry feedback prompt to help the agent focus its correction.
## Verification scope
The verification loop covers:
- **Main session responses** ✅ — all channel messages (WhatsApp, Telegram, Discord, etc.)
- **Sub-agent responses** ✅ — sub-agents spawned via `sessions_spawn` route through `runReplyAgent()` via the gateway
- **Heartbeat responses** ✅ — opt-in via `verifyHeartbeat: true`
- **CLI provider** ❌ — intentionally excluded (local development)
## Configuration
Add to `~/.openclaw/openclaw.json`:
```json
{
"agents": {
"defaults": {
"verifier": {
"enabled": true,
"model": "anthropic/claude-sonnet-4-5",
"verifyAll": false,
"verifyHeartbeat": true,
"maxAttempts": 3,
"triggerKeywords": ["done", "completed", "finished", "ready", "here you go"],
"timeoutSeconds": 30
}
}
}
}
```
### Configuration reference
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | `boolean` | `false` | Enable the verification loop |
| `verifyAll` | `boolean` | `false` | Verify **every** response regardless of trigger keywords |
| `verifyHeartbeat` | `boolean` | `false` | Verify heartbeat responses — catches lazy `HEARTBEAT_OK` when tasks exist |
| `model` | `string` | `"anthropic/claude-sonnet-4-5"` | Verifier model (recommended: different model family than agent) |
| `maxAttempts` | `number` | `3` | Max attempts including original response |
| `triggerKeywords` | `string[]` | `["done", "completed", ...]` | Keywords that trigger verification when `verifyAll` is `false` |
| `timeoutSeconds` | `number` | `30` | Timeout for the verifier LLM call |
All fields are optional with sensible defaults. The feature is **off by default** (`enabled: false`).
### Trigger modes
- **Keyword mode** (default): verification only runs when the agent's response contains a trigger keyword (e.g., "done", "completed"). Lightweight — most responses skip verification entirely.
- **Verify-all mode** (`verifyAll: true`): verification runs on every response. Higher cost but catches more issues. Recommended when quality matters more than latency.
- **Heartbeat mode** (`verifyHeartbeat: true`): all heartbeat responses are verified regardless of keywords. Catches agents that reply `HEARTBEAT_OK` without checking their `HEARTBEAT.md` tasks.
### Recommended setup
Use a **different model family** for the verifier than the agent to avoid self-bias:
- Agent: `anthropic/claude-opus-4-6` → Verifier: `openai/gpt-4.1` or `kimi-coding/k2p5`
- Agent: `openai/gpt-4.1` → Verifier: `anthropic/claude-sonnet-4-5`
## Design decisions
- **Deterministic pre-checks**: Fast `shouldSkipVerification()` avoids LLM calls for cases that can be decided logically (tool calls, already-sent content, empty responses)
- **Verifier call**: Standalone `completeSimple()` call, not a full agent turn — no tools, no session writes, no transcript pollution
- **Full context assembly**: `assembleVerifierContext()` reads AGENTS.md/SOUL.md from workspace at verification time, includes conversation history from memory (zero additional I/O), and applies token budgeting
- **Chain-of-thought evaluation**: Verifier reasons through 3 dimensions before verdicting — reduces snap-judgment failures
- **Structured FAIL categories**: Categories enable smarter retry prompts (e.g., "Your response was flagged for `incomplete` — specifically: missing error handling")
- **Retry mechanism**: Feedback injected into the same session conversation, not a new session — preserves full context
- **"Lost in the Middle" mitigation**: Context ordering places critical information (user message, agent response) at boundaries, less critical (rules, history) in the middle
- **Block streaming skip**: When block streaming has already sent content to the user, verification is skipped (can't un-send)
- **Heartbeat always-verify**: When `verifyHeartbeat` is true, heartbeat runs bypass keyword triggers entirely — every heartbeat response goes through verification. This is the correct behavior because the problem being solved is agents that lazily respond `HEARTBEAT_OK` without checking tasks.
- **No per-agent config in v1**: Only `agents.defaults.verifier` — per-agent overrides can be added later
- **System prompt is hardcoded**: A well-tested verification prompt with CoT rubric, not user-configurable in v1
- **Fail-open always**: Any verifier error (API failure, timeout, malformed response, model unavailable) silently delivers the original response
## Files changed
| File | Lines | Change |
|------|-------|--------|
| `src/auto-reply/reply/agent-verifier.ts` | 388 | Full-context verifier module: `assembleVerifierContext()`, `shouldSkipVerification()`, `verifyAgentResponse()`, structured FAIL parsing, workspace file reading |
| `src/auto-reply/reply/agent-verifier-trigger.ts` | 23 | Keyword trigger detector (`shouldVerifyResponse()`) |
| `src/auto-reply/reply/agent-runner.ts` | +218 | Verification loop integration in `runReplyAgent()`: pre-checks, context enrichment, retry with categorized feedback, heartbeat verification, lifecycle logging |
| `src/config/types.agent-defaults.ts` | +18 | `AgentVerifierConfig` type definition (includes `verifyHeartbeat`) |
| `src/config/zod-schema.agent-defaults.ts` | +12 | Zod runtime validation schema for verifier config |
| `src/auto-reply/reply/agent-verifier.test.ts` | 362 | 30 unit tests: verifier logic, context assembly, structured FAIL parsing, skip pre-checks |
| `src/auto-reply/reply/agent-verifier-trigger.test.ts` | 46 | 9 unit tests: keyword trigger detection |...
Most Similar PRs
#20709: feat: governed-agents skill — accountable sub-agent orchestration
by Nefas11 · 2026-02-19
70.1%
#21035: feat: agent hardening — modifying after_tool_call hook, cron contex...
by roelven · 2026-02-19
63.1%
#19923: feat: track held messages during compaction gate and split verifica...
by PrivacySmurf · 2026-02-18
61.5%
#13167: feat(agents): dispatch Claude Code CLI runs as persistent, resumabl...
by gyant · 2026-02-10
61.5%
#9049: fix: prevent subagent stuck loops and ensure user feedback
by maxtongwang · 2026-02-04
61.3%
#19136: feat(claude-code): implement spawn mode for Claude Code sub-agents
by botverse · 2026-02-17
60.7%
#20328: fix(agents): Add retry with exponential backoff for subagent announ...
by tiny-ship-it · 2026-02-18
60.6%
#15859: Graceful fallback + transparent model-failure logging
by wboudy · 2026-02-14
60.4%
#19636: fix(agents): harden overflow recovery observability + subagent term...
by Jackten · 2026-02-18
59.3%
#17749: feat(agents): Adds RLM harness: infinite context window (for some t...
by cezarc1 · 2026-02-16
59.0%