#20384: feat(recover): add run_error plugin hook + client-side integration (opt-in autoRecover)
agents
size: M
Cluster:
Plugin and Hook Enhancements
Adds a new plugin hook `run_error` and wires it into the embedded PI agent runner. This hook enables plugins (the auto-recover plugin) to make recovery decisions when a run fails and optionally request automatic retries or model switches.
## Behavior summary
- Runner invokes `run_error` for error/timeout/prompt errors
- If plugin suggests `retry` or `switch` and run param `autoRecover=true`, the runner will automatically perform the requested action (up to existing loop limits)
- If `autoRecover=false` (default), the runner will not auto-retry; instead clients can surface a manual retry/switch option using `meta.recoverySuggestion` returned with run results
- Hook failures are non-fatal and will be logged, preserving previous behavior when no hook is present
## Why this approach
- Keeps backward compatibility: no change in default behavior (manual confirmation)
- Minimal, targeted surface area: new hook + runner call-site; clients can opt-in to automatic recovery or simply display the plugin suggestion
- Plugin-runner contract is small and explicit (action + optional newModel)
## Files changed
- src/plugins/types.ts (+PluginHookRunErrorEvent, PluginHookRunErrorResult, added "run_error" to PluginHookName and handler map)
- src/plugins/hooks.ts (+runRunError wrapper + exported in hook runner)
- src/agents/pi-embedded-runner/run/params.ts (+autoRecover?: boolean)
- src/agents/pi-embedded-runner/run.ts (call-site: invoke run_error hook on failures; support for retry/switch/fail; add meta.recoverySuggestion; opt-in autoRecover behavior)
## Testing & QA Plan
### Manual test (quick)
1. Create a tiny plugin implementing the \`run_error\` hook and register it with the Gateway. Example hook body:
\`\`\`javascript
module.exports = {
hooks: {
run_error: async (event, ctx) => {
// for testing, request an immediate retry on first attempt
if (event.attempt === 0) return { action: 'retry' };
return { action: 'fail' };
}
}
};
\`\`\`
2. Start Gateway with the plugin enabled and run a query that triggers a model/provider failure (or mock a failure in tests).
3. Run the same scenario with \`autoRecover: true\` (set on the runner params/embedding caller path) and verify the runner resubmits the run automatically.
4. Run the scenario with \`autoRecover: false\` (default) and verify the returned run \`meta\` includes \`recoverySuggestion\` and there is no auto-retry.
### Automated tests to add (recommended)
- Unit test for hooks.ts: ensure runRunError forwards to runModifyingHook and returns the expected result when a plugin returns a value
- Integration test in pi-embedded-runner: mock a failing attempt, install a test hook that returns { action: 'retry' } and verify the runner attempts another run when autoRecover=true; and verify no auto-retry but meta.recoverySuggestion when autoRecover=false
## Notes for reviewers
- The PR intentionally keeps the default behavior unchanged (no automatic retry) to avoid surprising clients
- The \`recoverySuggestion\` meta field is deliberately simple: { suggestedAction, newModel?, attempt } — UI/clients can render actionable buttons around this (TUI/webchat)
- I chose "run_error" as the hook name (snake_case) to match the existing hook naming convention in this codebase. The exported helper is \`runRunError()\` in hooks.ts.
## Follow-up work
- Add client-side UI in TUI and WebChat to surface \`meta.recoverySuggestion\` and allow manual retry/switch
- Add a sample \`auto-recover\` plugin that implements common heuristics (retry on rate-limits, fallback to smaller model, rotate provider) and ships with tests
Most Similar PRs
#22624: feat(plugins): add before_context_send hook and model routing via b...
by davidrudduck · 2026-02-21
66.7%
#11155: feat(hooks): before_agent_start model/provider override (run-scoped...
by alanranger · 2026-02-07
65.9%
#14647: feat(plugins): allow before_agent_start hook to override model (#14...
by lailoo · 2026-02-12
65.7%
#17614: feat: allow before_agent_start hook to override model selection
by plc · 2026-02-16
65.1%
#23559: feat(plugins): add before_context_send hook and model routing via b...
by davidrudduck · 2026-02-22
65.1%
#9603: fix: initialize global hook runner on plugin registry cache hit
by kevins88288 · 2026-02-05
64.9%
#14873: [Feature]: Extend before_agent_start hook context with Model, Tools...
by akv2011 · 2026-02-12
64.8%
#8022: feat: implement before_model_select plugin hook
by dead-pool-aka-wilson · 2026-02-03
63.9%
#6095: feat(gateway): support modular guardrails extensions for securing a...
by Reapor-Yurnero · 2026-02-01
63.9%
#20426: feat: make llm_input/llm_output modifying hooks for middleware patt...
by chandika · 2026-02-18
63.8%