#13191: pi-embedded: enable failover when per-agent fallbacks are configured

by zesty-clawd open 2026-02-10 06:14 View on GitHub →

agents stale

Cluster: Model Fallbacks and Rate Limiting

### Problem When OpenClaw runs inbound messages via the embedded PI runner (e.g. Telegram), `runEmbeddedPiAgent()` decides whether to throw `FailoverError` (to let the outer `runWithModelFallback()` advance models) based only on `agents.defaults.model.fallbacks`. If `agents.defaults.model` is unset/null but an agent has per-agent model fallbacks configured (`agents.list[].model.fallbacks`), embedded runs treat rate-limit/auth failures as non-failover and do not advance to the next model. This effectively disables per-agent fallbacks for embedded sessions. ### Change Gate failover on **per-agent fallback override** when present: - If `resolveAgentModelFallbacksOverride(cfg, agentId)` is defined, use its length to decide whether failover is enabled. - Otherwise fall back to `agents.defaults.model.fallbacks`. This preserves existing semantics and also honors explicit per-agent disable (`fallbacks: []`). ### Tests Add `run.fallback-config.test.ts` to verify: - Per-agent fallbacks enable FailoverError even if defaults are empty - Per-agent fallbacks explicitly disabled do not enable failover ### Why This makes embedded sessions behave consistently with gateway-level model fallback selection, and prevents surprising “stuck on primary model” behavior when only per-agent fallbacks are configured.  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR updates the embedded PI runner’s failover gating so that `runEmbeddedPiAgent()` treats per-agent model fallback overrides (`agents.list[].model.fallbacks`) as enabling failover (and respects an explicit disable via `fallbacks: []`) when defaults are unset. A new unit test exercises the two key behaviors: per-agent fallbacks enable throwing `FailoverError` on auth/rate-limit style failures even when defaults are empty, and explicitly-disabled per-agent fallbacks do not. The change fits into the existing failover flow by only affecting the boolean `fallbackConfigured`, which is later used to decide whether to throw `FailoverError` (allowing `runWithModelFallback()` to advance models) vs rethrowing the underlying error during embedded runs. <h3>Confidence Score: 3/5</h3> - This PR is conceptually safe but the new test may not actually run in CI as added. - The runtime change is small and uses an existing helper (`resolveAgentModelFallbacksOverride`) with well-defined semantics (undefined vs empty array). However, the new test heavily mocks internals and I couldn't validate execution here (no npm), so the main remaining risk is that the added test isn’t executed by the repo’s configured test runner/patterns. - src/agents/pi-embedded-runner/run.fallback-config.test.ts  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>