← Back to PRs

#22568: fix(gateway): bump skills snapshot version on startup so sessions re-snapshot after restart (#22517)

by zwffff open 2026-02-21 09:23 View on GitHub →
gateway commands agents size: M
## Summary Describe the problem and fix in 2–5 bullets: - **Problem:** After a gateway restart or VM reboot, the gateway restores persisted sessions that still use the old skill snapshot. `getSkillsSnapshotVersion()` resets to 0 in the new process, so `(entry.skillsSnapshot?.version ?? 0) < snapshotVersion` is false and the snapshot is never refreshed, so new/updated skills are not used until the user runs `/new`. - **Why it matters:** Users see skills as ✓ ready in `openclaw skills list` but the agent does not use them in existing chats; they have to discover that `/new` is required, and it looks like a configuration bug instead of a stale session snapshot. - **What changed:** On gateway startup we call `bumpSkillsSnapshotVersion({ reason: "manual" })` once. The global skills version is raised above any persisted snapshot version, so `shouldRefreshSnapshot` is true for all sessions and the next message triggers a fresh skill snapshot without `/new`. - **What did NOT change (scope boundary):** No change to session store layout, CLI, or config. No new config option; existing `ensureSkillSnapshot` / version logic is unchanged. Only one extra call at startup. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #22517 - Related # ## User-visible / Behavior Changes - After a gateway restart or VM reboot, existing sessions now get a fresh skill snapshot on the **next message**; new/updated skills (e.g. new `gog` skill or env fix) are picked up without typing `/new`. - No config or CLI changes; no new defaults. ## Security Impact (required) - New permissions/capabilities? **No** - Secrets/tokens handling changed? **No** - New/changed network calls? **No** - Command/tool execution surface changed? **No** - Data access scope changed? **No** - If any `Yes`, explain risk + mitigation: N/A ## Repro + Verification ### Environment - OS: Linux (Ubuntu) — from issue - Runtime/container: Node 22+, npm global - Model/provider: N/A - Integration/channel (if any): Telegram, WhatsApp, Web UI - Relevant config (redacted): Default session store; one or more skills configured and showing ✓ ready in `openclaw skills list` ### Steps 1. Configure a skill (e.g. `gog`) and confirm it shows ✓ ready in `openclaw skills list`. 2. Start a chat (Telegram/WhatsApp/Web UI) so a session exists. 3. Change skill config (e.g. fix metadata or add env vars) and restart the gateway (`openclaw gateway stop` then `openclaw gateway start`) or reboot the VM. 4. In the **same** chat (no `/new`), try to use the updated skill. ### Expected - The agent uses the updated skill in that session on the next message. ### Actual - Before fix: Skill not available until user runs `/new`. After fix: Skill is available on next message without `/new`. ## Evidence Attach at least one: - [x] Failing test/log before + passing after: Existing gateway and session/skills tests still pass; no new test added (behavior is “bump version once at startup”). - [ ] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) ## Human Verification (required) - **Verified scenarios:** Ran `pnpm test -- src/gateway --run` (54 files, 556 tests) and `src/agents/skills/refresh` / `src/gateway/session-utils` tests; all pass. Confirmed `bumpSkillsSnapshotVersion` is invoked once during `startGatewayServer` after `initSubagentRegistry()`. - **Edge cases checked:** Single process startup only; no change to multi-agent or session store format. - **What you did not verify:** Live gateway restart + real channel (Telegram/WhatsApp) end-to-end with a new skill and existing session. ## Compatibility / Migration - Backward compatible? **Yes** - Config/env changes? **No** - Migration needed? **No** - If yes, exact upgrade steps: N/A ## Failure Recovery (if this breaks) - **How to disable/revert this change quickly:** Revert the single commit; remove the `bumpSkillsSnapshotVersion({ reason: "manual" });` call and the import from `server.impl.ts`. - **Files/config to restore:** `src/gateway/server.impl.ts` - **Known bad symptoms reviewers should watch for:** None expected; startup bump is a single call with no new config or I/O. ## Risks and Mitigations - **Risk:** First message after restart may do a skill snapshot rebuild for many sessions if many chats are active (same as today when version is bumped by the watcher). - **Mitigation:** Snapshot build is already used on version change; we only trigger it once per process at startup instead of never. - **Risk:** None others identified. <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR fixes a session persistence bug where existing sessions wouldn't pick up new or updated skills after a gateway restart. The root cause is that `getSkillsSnapshotVersion()` resets to 0 in the new process, causing the snapshot refresh check to fail for persisted sessions. The fix adds a single call to `bumpSkillsSnapshotVersion({ reason: "manual" })` during gateway startup in `src/gateway/server.impl.ts:268`. This bumps the global version to `Date.now()`, which is guaranteed to be higher than any previously persisted snapshot version, forcing all existing sessions to rebuild their skill snapshots on the next message. - Simple, targeted fix that leverages existing snapshot refresh logic - No changes to session store layout, API contracts, or configuration - Aligns with the existing version-based snapshot refresh mechanism in `src/auto-reply/reply/session-updates.ts:157-158` <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with no production risks - The fix is minimal (2 lines of code), uses existing well-tested APIs (`bumpSkillsSnapshotVersion`), and only affects session snapshot refresh timing. The change makes persisted sessions behave consistently with new sessions after a restart, fixing a user-visible bug without introducing new functionality or touching critical paths. - No files require special attention <sub>Last reviewed commit: 2db979c</sub> <!-- greptile_other_comments_section --> <sub>(5/5) You can turn off certain types of comments like style [here](https://app.greptile.com/review/github)!</sub> <!-- /greptile_comment -->

Most Similar PRs