#21994: Config: load valid backup when primary config is invalid
size: M
Cluster:
Wizard Enhancements and Config Fixes
## Summary
- Problem: openclaw.json can become invalid (example seen in prod logs: agents.defaults.compaction.mode: Invalid input), causing startup
failure.
- Why it matters: gateway startup failure can disrupt channel availability (observed Telegram outage window while config stayed invalid).
- What changed: config loader now attempts rollback to a valid backup (openclaw.json.bak*, newest first) when primary config is invalid.
- What did NOT change (scope boundary): schema remains strict; no coercion/normalization of invalid values (e.g. "auto" is still invalid for
compaction.mode).
## Change Type (select all)
- [x] Bug fix
- [ ] Feature
- [ ] Refactor
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [x] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [x] Memory / storage
- [ ] Integrations
- [ ] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Closes #
- Related #
## User-visible / Behavior Changes
- On startup, when primary config is invalid, OpenClaw now tries valid openclaw.json.bak* backups automatically instead of immediately
falling back to empty config.
- Invalid config values are still rejected (strict schema unchanged).
## Security Impact (required)
- New permissions/capabilities? (No)
- Secrets/tokens handling changed? (No)
- New/changed network calls? (No)
- Command/tool execution surface changed? (No)
- Data access scope changed? (No)
- If any Yes, explain risk + mitigation:
## Repro + Verification
### Environment
- OS: Linux
- Runtime/container: Node 22.22.0, OpenClaw 2026.2.17 runtime logs
- Model/provider: N/A (failure occurs during config load)
- Integration/channel (if any): Telegram
- Relevant config (redacted): agents.defaults.compaction.mode: "auto" (invalid)
### Steps
1. Set invalid config value: agents.defaults.compaction.mode = "auto".
2. Ensure a valid openclaw.json.bak exists.
3. Start gateway.
### Expected
- Gateway recovers by loading a valid backup config and continues startup.
### Actual
- Before fix: repeated Config invalid startup failures.
- After fix: loader can recover via valid backup; startup continues.
## Evidence
- [x] Failing test/log before + passing after
- [x] Trace/log snippets
- [ ] Screenshot/recording
- [ ] Perf numbers (if relevant)
Before (failing evidence):
- `Config invalid`
- `agents.defaults.compaction.mode: Invalid input`
- `Run: openclaw doctor --fix`
After (passing evidence):
- Added regression test:
- `falls back to valid .bak config when primary config is invalid` (`src/config/io.compat.test.ts`)
- Verification command passed:
- `pnpm test src/config/io.compat.test.ts src/config/io.write-config.test.ts`
- Result: `13 passed`
## Human Verification (required)
- Verified scenarios:
- Confirmed invalid config failure pattern from real logs.
- Confirmed new loader fallback path to valid backups in code and tests.
- Edge cases checked:
- Primary invalid + valid backup present -> backup loaded.
- Existing config I/O tests still pass.
- What you did not verify:
- Full end-to-end restart behavior on a separate production host in this PR run.
## Compatibility / Migration
- Backward compatible? (Yes)
- Config/env changes? (No)
- Migration needed? (No)
- If yes, exact upgrade steps:
## Failure Recovery (if this breaks)
- How to disable/revert this change quickly:
- Revert commit 0801a6f34.
- Files/config to restore:
- src/config/io.ts
- src/config/io.compat.test.ts
- Known bad symptoms reviewers should watch for:
- Unexpected backup selection if multiple backups exist and newest valid backup is not the one operator expected.
## Risks and Mitigations
- Risk:
- Backup chosen may not match operator expectation when multiple backups are present.
- Mitigation:
- Backups are ordered by newest mtime and must pass full validation before use.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Implements automatic recovery from invalid primary config by falling back to valid backup files (`openclaw.json.bak*`), ordered by newest modification time. This prevents gateway startup failures when the primary config becomes corrupted.
**Key changes:**
- Extracted config reading/validation logic into `readAndValidateConfig` helper
- Extracted runtime pipeline into `applyRuntimeConfigPipeline` helper
- Added `listBackupCandidates` to find and sort backup files by mtime
- Added `loadFromBackupOnInvalidPrimary` to attempt loading valid backups
- Modified `loadConfig` error handling to try backup recovery before returning empty config
**Issues found:**
- Race condition: `statSync` in backup sorting could fail if file is deleted between directory read and stat
- Error handling gap: non-validation errors (file system, parse, `DuplicateAgentDirError`) are silently swallowed when loading backups, which could hide legitimate problems
<h3>Confidence Score: 3/5</h3>
- Safe to merge with minor fixes recommended
- The core functionality is solid and addresses a real production issue, with good test coverage for the happy path. However, two error handling edge cases need attention: a potential race condition in file stat operations and incomplete error handling when loading backup configs. These issues are unlikely to occur in practice but could cause confusing failures in edge cases.
- Pay attention to `src/config/io.ts` - the error handling improvements are recommended but not critical for merge
<sub>Last reviewed commit: 0801a6f</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#21931: feat(config): auto-rollback to last known-good backup on invalid co...
by Protocol-zero-0 · 2026-02-20
83.0%
#23779: fix(config): auto-repair invalid config keys from backup on load
by cintia09 · 2026-02-22
82.6%
#5823: fix(config): exit cleanly on invalid config instead of high CPU loop
by gavinbmoore · 2026-02-01
79.4%
#11602: fix(config): skip stale legacy config files when openclaw.json exists
by akoscz · 2026-02-08
77.6%
#19510: fix(config): preserve configured values on invalid config validatio...
by yash27-lab · 2026-02-17
76.9%
#19020: bugfix(gateway): Handle invalid model provider API config gracefully\…
by funkyjonx · 2026-02-17
76.6%
#13988: feat(backup): add backup/restore CLI with local + S3 storage
by n24q02m · 2026-02-11
76.0%
#14313: feat: Atomic OpenClaw Configuration Management
by aronchick · 2026-02-11
75.3%
#11455: fix(gateway): default gateway.mode to local when unset
by AnonO6 · 2026-02-07
75.1%
#22720: fix: notify sessions on invalid config during hot-reload
by jayleekr · 2026-02-21
74.8%