#21931: feat(config): auto-rollback to last known-good backup on invalid config startup
docs
gateway
size: S
Cluster:
Wizard Enhancements and Config Fixes
## TL;DR for maintainers
When `openclaw.json` fails validation at gateway startup, the gateway now
automatically restores the most recent valid `.bak` backup instead of crashing
into an unrecoverable restart loop. The broken config is preserved as
`openclaw.json.broken-<timestamp>` for debugging.
Closes #18200
---
## Problem
If a user (or an agent via `config.patch`) writes a broken `openclaw.json`,
the gateway throws at startup and exits. Process supervisors (`launchd`,
`systemd`) immediately restart it — into the same crash. Because the gateway
**is** the bot connection, users cannot fix the config through the UI and are
stuck in a silent failure loop (see [tweet with 828 likes](https://x.com/xBenJamminx/status/1888741825891164190)).
## Solution
OpenClaw already creates up to 5 rotated backups (`.bak`, `.bak.1`, …,
`.bak.4`) every time `writeConfigFile` persists a change. This PR adds the
**restore** side:
1. **`tryLoadValidConfigBackup(configPath)`** (`src/config/io.ts`)
Iterates `.bak` → `.bak.4`, reads + validates each file, and returns the
first snapshot that passes `validateConfigObjectRawWithPlugins`. Returns
`null` when no usable backup exists.
2. **Gateway startup fallback** (`src/gateway/server.impl.ts`)
When `configSnapshot.exists && !configSnapshot.valid`:
- Copies the broken config to `openclaw.json.broken-<timestamp>` (best-effort)
- Calls `tryLoadValidConfigBackup` — if a valid backup is found, writes it
back to `openclaw.json` and continues startup with a loud `log.warn`
- If no backup is valid, throws the same error as before (with an updated
message noting that no backup was found)
3. **Re-export** (`src/config/config.ts`) — exposes the new helper.
## What's NOT in this PR
- Notification via messaging channel on rollback (Issue #18200 item 3 — separate PR)
- Crash-loop rate limiting (tracked in #16810)
## Testing
- 4 new unit tests for `tryLoadValidConfigBackup`:
- Returns `null` when no backups exist
- Finds first valid `.bak`
- Skips invalid backups and finds the next valid one
- Returns `null` when all backups are invalid
- All existing config IO tests pass (8/8 write-config, 25/25 config-misc)
## AI disclosure
This change was AI-assisted (research + implementation). All code was manually
reviewed and tested by the author.
Made with [Cursor](https://cursor.com)
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Adds automatic config rollback to prevent crash loops when `openclaw.json` fails validation at gateway startup. The gateway now attempts to restore from up to 5 rotated backups (`.bak` through `.bak.4`) when the config is invalid, preserving the broken config as `openclaw.json.broken-<timestamp>` for debugging. If no valid backup exists, the gateway throws an error as before.
The implementation is well-structured with clear separation of concerns:
- `tryLoadValidConfigBackup` in `src/config/io.ts` handles the backup search and validation logic
- Gateway startup in `src/gateway/server.impl.ts` orchestrates the rollback when needed
- Comprehensive test coverage validates all rollback scenarios
The rollback logic correctly leverages the existing backup rotation system (5 backups created on each config write) and uses the same validation function (`validateConfigObjectRawWithPlugins`) to ensure consistency. The broken config is preserved for debugging before restoration, and a warning is logged with full details about the rollback.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge - the implementation is well-designed, thoroughly tested, and solves a critical crash-loop issue without introducing risk.
- The code follows existing patterns, includes comprehensive unit tests covering all scenarios (no backups, first valid backup, skipping invalid backups, all invalid), and handles edge cases properly with best-effort error handling where appropriate. The rollback logic integrates cleanly with the existing config validation and backup rotation systems. No breaking changes or risky modifications to core logic.
- No files require special attention
<sub>Last reviewed commit: 73fcbf9</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#23779: fix(config): auto-repair invalid config keys from backup on load
by cintia09 · 2026-02-22
84.5%
#21994: Config: load valid backup when primary config is invalid
by islavutin · 2026-02-20
83.0%
#17702: feat: crash-loop detection and last-known-good config rollback
by aronchick · 2026-02-16
81.7%
#21944: feat(gateway): crash-loop protection with escalating backoff
by Protocol-zero-0 · 2026-02-20
80.1%
#19129: fix(config): block destructive config writes instead of only loggin...
by pierreeurope · 2026-02-17
79.9%
#11602: fix(config): skip stale legacy config files when openclaw.json exists
by akoscz · 2026-02-08
79.2%
#11455: fix(gateway): default gateway.mode to local when unset
by AnonO6 · 2026-02-07
77.8%
#19510: fix(config): preserve configured values on invalid config validatio...
by yash27-lab · 2026-02-17
77.1%
#12234: gateway: incident tracking, recover command, and ciao ERR_SERVER_CL...
by levineam · 2026-02-09
76.6%
#5823: fix(config): exit cleanly on invalid config instead of high CPU loop
by gavinbmoore · 2026-02-01
76.6%