#22660: feat(agents): prioritize fallback-chain recovery and configurable re-probe interval
commands
agents
size: M
Cluster:
Model Fallbacks and Rate Limiting
## Summary
This PR improves model failover recovery behavior and makes recovery probing configurable.
When an agent is currently running on a lower-priority fallback model, OpenClaw now re-evaluates the configured priority chain and promotes to the highest-priority currently available model:
- `primary`
- `fallback[0]`
- `fallback[1]`
- ...
## Why
Agents could remain on lower fallbacks (for example `f3`) longer than necessary after upstream limits recovered, even when a higher-priority model (`primary` or `f1`) was available again.
## What Changed
### Runtime behavior
- Added fallback-chain re-promotion in `runWithModelFallback`.
- If current model is a configured fallback (and no explicit override chain is supplied), candidate order is rebuilt to configured priority order (`primary -> fallbacks...`).
- Kept short probe throttle (`30s`) and near-expiry probing.
- Added periodic probing during cooldown windows to avoid stale cooldown metadata keeping agents on lower-priority models.
### Config
Added a configurable periodic probe interval:
- `agents.defaults.model.primaryRecoveryProbeEvery`
- `agents.list[].model.primaryRecoveryProbeEvery` (per-agent override)
Duration strings are validated (default unit: minutes), e.g. `45s`, `3m`, `1h`.
Default remains `5m` when unset.
### Integration wiring
Plumbed resolved probe interval through all relevant call paths:
- command agent runs
- followup/auto-reply runs
- isolated cron agent runs
### Schema/docs/tests
- Updated config types and zod schemas.
- Added schema help + labels for new config fields.
- Added/updated unit and e2e coverage for re-promotion/probing behavior.
- Fixed affected cron skill-filter test mock to include the new resolver export.
## Backward Compatibility
- No breaking config changes.
- Explicit `fallbacksOverride` behavior remains preserved.
- Existing model primary/fallback definitions continue to work unchanged.
## AI Assistance
- AI-assisted: yes (Codex)
- Degree of testing: fully tested locally (build/check/test)
- I understand and verified the implemented behavior and changed call paths.
- Session logs/prompts can be shared on request.
## Local Validation
Ran and passed:
- `pnpm build`
- `pnpm check`
- `env -u OPENCLAW_HOME -u OPENCLAW_STATE_DIR HOME="$(mktemp -d)" pnpm test`
Also ran targeted suites during implementation:
- `pnpm vitest run --config vitest.unit.config.ts src/agents/model-fallback.probe.test.ts`
- `pnpm vitest run --config vitest.e2e.config.ts src/agents/model-fallback.e2e.test.ts`
- `pnpm vitest run --config vitest.e2e.config.ts src/agents/agent-scope.e2e.test.ts src/commands/agent.e2e.test.ts`
- `pnpm vitest run --config vitest.unit.config.ts src/auto-reply/reply/agent-runner-utils.test.ts src/config/config.schema-regressions.test.ts src/auto-reply/reply/followup-runner.test.ts`
- `pnpm vitest run --config vitest.unit.config.ts src/cron/isolated-agent/run.skill-filter.test.ts`
- `pnpm -s tsc --noEmit`
Most Similar PRs
#23738: feat(fallback): first-class transition visibility + low-noise autom...
by SmithLabsLLC · 2026-02-22
74.6%
#18670: feat: add first-class Claude Code CLI auth path + CLI model UX hard...
by SmithLabsLLC · 2026-02-16
74.5%
#20275: fix(cli): include primary model in allowlist when adding fallbacks
by MFS-code · 2026-02-18
74.2%
#22064: fix(failover): bypass models allowlist for configured fallback models
by winston-bepresent · 2026-02-20
74.1%
#19252: fix(agents): continue model fallback on failover text payloads
by mahsumaktas · 2026-02-17
73.8%
#23816: fix(agents): model fallback skipped during session overrides and pr...
by ramezgaberiel · 2026-02-22
73.7%
#21503: feat(doctor): validate fallback model providers are defined (#20909)
by echoVic · 2026-02-20
73.3%
#16838: fix: include configured fallbacks in model allowlist
by taw0002 · 2026-02-15
73.2%
#15859: Graceful fallback + transparent model-failure logging
by wboudy · 2026-02-14
72.7%
#8390: feat: notify user when fallback model is used (#8182)
by Glucksberg · 2026-02-04
72.3%