#22568: fix(gateway): bump skills snapshot version on startup so sessions re-snapshot after restart (#22517)
gateway
commands
agents
size: M
Cluster:
Skill and Session Management Fixes
## Summary
Describe the problem and fix in 2–5 bullets:
- **Problem:** After a gateway restart or VM reboot, the gateway restores persisted sessions that still use the old skill snapshot. `getSkillsSnapshotVersion()` resets to 0 in the new process, so `(entry.skillsSnapshot?.version ?? 0) < snapshotVersion` is false and the snapshot is never refreshed, so new/updated skills are not used until the user runs `/new`.
- **Why it matters:** Users see skills as ✓ ready in `openclaw skills list` but the agent does not use them in existing chats; they have to discover that `/new` is required, and it looks like a configuration bug instead of a stale session snapshot.
- **What changed:** On gateway startup we call `bumpSkillsSnapshotVersion({ reason: "manual" })` once. The global skills version is raised above any persisted snapshot version, so `shouldRefreshSnapshot` is true for all sessions and the next message triggers a fresh skill snapshot without `/new`.
- **What did NOT change (scope boundary):** No change to session store layout, CLI, or config. No new config option; existing `ensureSkillSnapshot` / version logic is unchanged. Only one extra call at startup.
## Change Type (select all)
- [x] Bug fix
- [ ] Feature
- [ ] Refactor
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [x] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [ ] Memory / storage
- [ ] Integrations
- [ ] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Closes #22517
- Related #
## User-visible / Behavior Changes
- After a gateway restart or VM reboot, existing sessions now get a fresh skill snapshot on the **next message**; new/updated skills (e.g. new `gog` skill or env fix) are picked up without typing `/new`.
- No config or CLI changes; no new defaults.
## Security Impact (required)
- New permissions/capabilities? **No**
- Secrets/tokens handling changed? **No**
- New/changed network calls? **No**
- Command/tool execution surface changed? **No**
- Data access scope changed? **No**
- If any `Yes`, explain risk + mitigation: N/A
## Repro + Verification
### Environment
- OS: Linux (Ubuntu) — from issue
- Runtime/container: Node 22+, npm global
- Model/provider: N/A
- Integration/channel (if any): Telegram, WhatsApp, Web UI
- Relevant config (redacted): Default session store; one or more skills configured and showing ✓ ready in `openclaw skills list`
### Steps
1. Configure a skill (e.g. `gog`) and confirm it shows ✓ ready in `openclaw skills list`.
2. Start a chat (Telegram/WhatsApp/Web UI) so a session exists.
3. Change skill config (e.g. fix metadata or add env vars) and restart the gateway (`openclaw gateway stop` then `openclaw gateway start`) or reboot the VM.
4. In the **same** chat (no `/new`), try to use the updated skill.
### Expected
- The agent uses the updated skill in that session on the next message.
### Actual
- Before fix: Skill not available until user runs `/new`. After fix: Skill is available on next message without `/new`.
## Evidence
Attach at least one:
- [x] Failing test/log before + passing after: Existing gateway and session/skills tests still pass; no new test added (behavior is “bump version once at startup”).
- [ ] Trace/log snippets
- [ ] Screenshot/recording
- [ ] Perf numbers (if relevant)
## Human Verification (required)
- **Verified scenarios:** Ran `pnpm test -- src/gateway --run` (54 files, 556 tests) and `src/agents/skills/refresh` / `src/gateway/session-utils` tests; all pass. Confirmed `bumpSkillsSnapshotVersion` is invoked once during `startGatewayServer` after `initSubagentRegistry()`.
- **Edge cases checked:** Single process startup only; no change to multi-agent or session store format.
- **What you did not verify:** Live gateway restart + real channel (Telegram/WhatsApp) end-to-end with a new skill and existing session.
## Compatibility / Migration
- Backward compatible? **Yes**
- Config/env changes? **No**
- Migration needed? **No**
- If yes, exact upgrade steps: N/A
## Failure Recovery (if this breaks)
- **How to disable/revert this change quickly:** Revert the single commit; remove the `bumpSkillsSnapshotVersion({ reason: "manual" });` call and the import from `server.impl.ts`.
- **Files/config to restore:** `src/gateway/server.impl.ts`
- **Known bad symptoms reviewers should watch for:** None expected; startup bump is a single call with no new config or I/O.
## Risks and Mitigations
- **Risk:** First message after restart may do a skill snapshot rebuild for many sessions if many chats are active (same as today when version is bumped by the watcher).
- **Mitigation:** Snapshot build is already used on version change; we only trigger it once per process at startup instead of never.
- **Risk:** None others identified.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR fixes a session persistence bug where existing sessions wouldn't pick up new or updated skills after a gateway restart. The root cause is that `getSkillsSnapshotVersion()` resets to 0 in the new process, causing the snapshot refresh check to fail for persisted sessions.
The fix adds a single call to `bumpSkillsSnapshotVersion({ reason: "manual" })` during gateway startup in `src/gateway/server.impl.ts:268`. This bumps the global version to `Date.now()`, which is guaranteed to be higher than any previously persisted snapshot version, forcing all existing sessions to rebuild their skill snapshots on the next message.
- Simple, targeted fix that leverages existing snapshot refresh logic
- No changes to session store layout, API contracts, or configuration
- Aligns with the existing version-based snapshot refresh mechanism in `src/auto-reply/reply/session-updates.ts:157-158`
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with no production risks
- The fix is minimal (2 lines of code), uses existing well-tested APIs (`bumpSkillsSnapshotVersion`), and only affects session snapshot refresh timing. The change makes persisted sessions behave consistently with new sessions after a restart, fixing a user-visible bug without introducing new functionality or touching critical paths.
- No files require special attention
<sub>Last reviewed commit: 2db979c</sub>
<!-- greptile_other_comments_section -->
<sub>(5/5) You can turn off certain types of comments like style [here](https://app.greptile.com/review/github)!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#22525: [Bug]: Session snapshot not reloading skills after gateway restart ...
by zwffff · 2026-02-21
93.8%
#16654: fix: refresh skills snapshot when managed skills change
by PhineasFleabottom · 2026-02-15
89.3%
#12209: fix(skills): refresh stale skill snapshot after gateway restart
by mcaxtr · 2026-02-09
87.2%
#13412: fix(sessions): refresh allowAgents permissions after gateway restart
by arun-dev-des · 2026-02-10
83.2%
#16244: feat(gateway): add session files API and external skill management
by wanquanY · 2026-02-14
82.0%
#21883: fix: guard against undefined snapshot.skills in applySkillEnvOverride…
by felipedamacenoteodoro · 2026-02-20
81.4%
#20533: fix: strip resolvedSkills from session store to prevent snapshot bloat
by echoVic · 2026-02-19
81.3%
#11250: fix: expand skills watcher ignore list and improve session repair l...
by zhangzhefang-github · 2026-02-07
80.9%
#21521: fix: re-resolve skill paths at runtime for cross-machine portability
by mmaghsoodnia · 2026-02-20
78.9%
#23749: fix some issues
by tronpis · 2026-02-22
78.2%