#22525: [Bug]: Session snapshot not reloading skills after gateway restart or VM reboot — requires /new to take effect
gateway
agents
size: S
Cluster:
Skill and Session Management Fixes
## Summary
- **Problem:** After gateway restart or VM reboot, the agent kept using old session skill snapshots; newly configured skills did not appear until the user ran `/new`.
- **Why it matters:** Users expect a restart to pick up config/skill changes; having to type `/new` was non-obvious and looked like a skill config bug.
- **What changed:** On gateway startup we call `bumpSkillsSnapshotVersion({ reason: "manual" })` so the in-memory version is higher than any persisted session snapshot. On the next message, each session sees `shouldRefreshSnapshot` true and rebuilds its skills snapshot from current config.
- **What did NOT change (scope boundary):** No change to session store schema, `/new` behavior, or skill loading logic; only the startup bump that forces a refresh on first use after restart.
## Change Type (select all)
- [x] Bug fix
- [ ] Feature
- [ ] Refactor
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [x] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [ ] Memory / storage
- [ ] Integrations
- [ ] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Closes #22517
- Related #
## User-visible / Behavior Changes
- After gateway restart or VM reboot, the next message in each session triggers a skills snapshot refresh; new/updated skills (and config) take effect without requiring `/new`.
- No config or CLI changes.
## Security Impact (required)
- New permissions/capabilities? **No**
- Secrets/tokens handling changed? **No**
- New/changed network calls? **No**
- Command/tool execution surface changed? **No**
- Data access scope changed? **No**
- If any Yes, explain risk + mitigation: N/A
## Repro + Verification
### Environment
- OS: Linux (Ubuntu)
- Runtime/container: Node 22+
- Model/provider: Any
- Integration/channel (if any): Telegram, WhatsApp, Web UI
- Relevant config (redacted): `openclaw.json` with a skill (e.g. `gog`) and env/metadata
### Steps
1. Configure a skill and confirm it shows ✓ ready in `openclaw skills list`.
2. Start a chat (create a session), then change skill config (e.g. fix metadata or env).
3. Restart the gateway (`openclaw gateway stop` / `openclaw gateway start`) or reboot the VM.
4. Send a message in the same chat (no `/new`) and try to use the skill.
### Expected
- Skill is available and works with the updated config after restart, without `/new`.
### Actual (before fix)
- Skill stayed unavailable until the user typed `/new`.
## Evidence
- [x] Failing test/log before + passing after: Existing logic in `ensureSkillSnapshot` already refreshes when `snapshotVersion > (session.skillsSnapshot?.version ?? 0)`; the fix ensures `snapshotVersion` is > 0 after restart so that condition holds.
- [ ] Trace/log snippets
- [ ] Screenshot/recording
- [ ] Perf numbers (if relevant)
## Human Verification (required)
- **Verified scenarios:** Build and `src/agents/skills/refresh.test.ts` pass; startup path calls `bumpSkillsSnapshotVersion` once.
- **Edge cases checked:** No change to session store format; `OPENCLAW_TEST_FAST` fast path unchanged; watcher/version logic unchanged except for the one startup bump.
- **What you did not verify:** Full E2E with real gateway restart and multiple channels.
## Compatibility / Migration
- Backward compatible? **Yes**
- Config/env changes? **No**
- Migration needed? **No**
- If yes, exact upgrade steps: N/A
## Failure Recovery (if this breaks)
- **How to disable/revert:** Revert this PR or remove the `bumpSkillsSnapshotVersion` call in `server-startup.ts`.
- **Files/config to restore:** `src/gateway/server-startup.ts`
- **Known bad symptoms reviewers should watch for:** None expected; if version bumps too often it could cause extra snapshot rebuilds on first message after start (one-time per session).
## Risks and Mitigations
- **Risk:** None identified; single in-memory version bump at startup, no new I/O or secrets.
- **Mitigation:** N/A
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR bundles two separate bug fixes that should be documented together in the PR description:
1. **Skills snapshot refresh after gateway restart** (fix #22517): Calls `bumpSkillsSnapshotVersion()` on gateway startup to ensure in-memory version is higher than persisted session snapshots, forcing refresh on next message
2. **Reasoning default based on model capability** (fix #22456): Adds `resolveReasoningDefault()` to automatically enable reasoning when a model has `reasoning: true` in the catalog, unless explicitly overridden by user
Both fixes follow established patterns in the codebase and include test coverage. The skills fix solves the root cause where `globalVersion` started at 0 after restart, preventing `shouldRefreshSnapshot` from triggering. The reasoning fix implements the expected behavior where model capabilities inform default settings.
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge with minor documentation concern
- Both fixes are technically sound and follow established patterns. The skills snapshot fix correctly addresses the root cause where `globalVersion` initialized to 0. The reasoning default fix implements expected behavior with proper fallback logic. Tests are included for both changes. Score reduced by 1 because the PR description only mentions fix #22517 but the PR actually includes an unrelated fix for #22456 - this creates confusion about the PR scope but doesn't affect code quality.
- No files require special attention - all changes follow existing patterns
<sub>Last reviewed commit: 59facef</sub>
<!-- greptile_other_comments_section -->
<sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#22568: fix(gateway): bump skills snapshot version on startup so sessions r...
by zwffff · 2026-02-21
93.8%
#16654: fix: refresh skills snapshot when managed skills change
by PhineasFleabottom · 2026-02-15
87.4%
#12209: fix(skills): refresh stale skill snapshot after gateway restart
by mcaxtr · 2026-02-09
84.8%
#13412: fix(sessions): refresh allowAgents permissions after gateway restart
by arun-dev-des · 2026-02-10
83.5%
#16244: feat(gateway): add session files API and external skill management
by wanquanY · 2026-02-14
82.9%
#23749: fix some issues
by tronpis · 2026-02-22
82.2%
#11250: fix: expand skills watcher ignore list and improve session repair l...
by zhangzhefang-github · 2026-02-07
81.9%
#21883: fix: guard against undefined snapshot.skills in applySkillEnvOverride…
by felipedamacenoteodoro · 2026-02-20
81.6%
#21521: fix: re-resolve skill paths at runtime for cross-machine portability
by mmaghsoodnia · 2026-02-20
81.3%
#9221: fix(skills): use skillKey for env config lookup in snapshots
by gavinbmoore · 2026-02-05
80.3%