#13191: pi-embedded: enable failover when per-agent fallbacks are configured
agents
stale
Cluster:
Model Fallbacks and Rate Limiting
### Problem
When OpenClaw runs inbound messages via the embedded PI runner (e.g. Telegram), `runEmbeddedPiAgent()` decides whether to throw `FailoverError` (to let the outer `runWithModelFallback()` advance models) based only on `agents.defaults.model.fallbacks`.
If `agents.defaults.model` is unset/null but an agent has per-agent model fallbacks configured (`agents.list[].model.fallbacks`), embedded runs treat rate-limit/auth failures as non-failover and do not advance to the next model. This effectively disables per-agent fallbacks for embedded sessions.
### Change
Gate failover on **per-agent fallback override** when present:
- If `resolveAgentModelFallbacksOverride(cfg, agentId)` is defined, use its length to decide whether failover is enabled.
- Otherwise fall back to `agents.defaults.model.fallbacks`.
This preserves existing semantics and also honors explicit per-agent disable (`fallbacks: []`).
### Tests
Add `run.fallback-config.test.ts` to verify:
- Per-agent fallbacks enable FailoverError even if defaults are empty
- Per-agent fallbacks explicitly disabled do not enable failover
### Why
This makes embedded sessions behave consistently with gateway-level model fallback selection, and prevents surprising “stuck on primary model” behavior when only per-agent fallbacks are configured.
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR updates the embedded PI runner’s failover gating so that `runEmbeddedPiAgent()` treats per-agent model fallback overrides (`agents.list[].model.fallbacks`) as enabling failover (and respects an explicit disable via `fallbacks: []`) when defaults are unset. A new unit test exercises the two key behaviors: per-agent fallbacks enable throwing `FailoverError` on auth/rate-limit style failures even when defaults are empty, and explicitly-disabled per-agent fallbacks do not.
The change fits into the existing failover flow by only affecting the boolean `fallbackConfigured`, which is later used to decide whether to throw `FailoverError` (allowing `runWithModelFallback()` to advance models) vs rethrowing the underlying error during embedded runs.
<h3>Confidence Score: 3/5</h3>
- This PR is conceptually safe but the new test may not actually run in CI as added.
- The runtime change is small and uses an existing helper (`resolveAgentModelFallbacksOverride`) with well-defined semantics (undefined vs empty array). However, the new test heavily mocks internals and I couldn't validate execution here (no npm), so the main remaining risk is that the added test isn’t executed by the repo’s configured test runner/patterns.
- src/agents/pi-embedded-runner/run.fallback-config.test.ts
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#8390: feat: notify user when fallback model is used (#8182)
by Glucksberg · 2026-02-04
81.9%
#10178: fix: trigger fallback when model resolution fails with unknown model
by Yida-Dev · 2026-02-06
81.8%
#21152: fix(agents): throw FailoverError for unknown model so fallback chai...
by Mellowambience · 2026-02-19
81.2%
#22064: fix(failover): bypass models allowlist for configured fallback models
by winston-bepresent · 2026-02-20
80.9%
#15815: Fallback LLM doesn't trigger if primary model is local
by shihanqu · 2026-02-13
80.3%
#11349: fix(agents): do not filter fallback models by models allowlist
by liuxiaopai-ai · 2026-02-07
80.2%
#19252: fix(agents): continue model fallback on failover text payloads
by mahsumaktas · 2026-02-17
80.2%
#13658: fix: silent model failover with fallback notification
by taw0002 · 2026-02-10
79.0%
#11174: Fix/fried chicken error
by jfgrissom · 2026-02-07
78.9%
#7229: fix: add network error resilience to agentic loop failover
by ai-fanatic · 2026-02-02
78.6%