#12209: fix(skills): refresh stale skill snapshot after gateway restart
size: M
trusted-contributor
experienced-contributor
Cluster:
Skill and Session Management Fixes
## Summary
- Fix stale skills in existing sessions after gateway restart (#12092)
- When the gateway restarts, the in-memory skills version resets to 0 while sessions retain snapshots from the prior process (version > 0)
- The `shouldRefreshSnapshot` check required `snapshotVersion > 0`, so it never triggered a rebuild after restart
- Add restart detection: when in-memory version is 0 but persisted version > 0, rebuild the snapshot
## Root Cause
`getSkillsSnapshotVersion()` returns from in-memory state (`workspaceVersions` / `globalVersion` in `refresh.ts`), which resets to 0 on process restart. The comparison at `session-updates.ts:147-148` was:
```ts
const shouldRefreshSnapshot =
snapshotVersion > 0 && (nextEntry?.skillsSnapshot?.version ?? 0) < snapshotVersion;
```
Since `snapshotVersion` is 0 after restart, the condition `snapshotVersion > 0` is always false, so stale snapshots are reused forever.
## Fix
Extend the condition to also detect the restart scenario:
```ts
const shouldRefreshSnapshot =
(snapshotVersion > 0 && persistedVersion < snapshotVersion) ||
(snapshotVersion === 0 && persistedVersion > 0);
```
The second clause detects: "process just started (version 0) but session has a snapshot from a prior lifetime (version > 0)" → rebuild.
## Test Plan
- [x] Write failing test reproducing the restart scenario (stale snapshot returned)
- [x] Confirm test fails before fix
- [x] Implement fix (2-line change in `session-updates.ts`)
- [x] Confirm all 4 tests pass after fix
- [x] `pnpm build` passes
- [x] `pnpm check` passes (lint + format)
- [x] `codex review --base main` returns zero issues
### TDD: All 4 new tests fail before, pass after
1. **Restart scenario** — in-memory version 0, persisted version > 0 → rebuilds snapshot
2. **Normal operation** — in-memory version matches persisted → reuses snapshot
3. **Watcher fired** — in-memory version higher than persisted → rebuilds snapshot
4. **No prior snapshot** — no existing snapshot, version 0 → builds fresh
Fixes #12092
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR updates `ensureSkillSnapshot` to correctly refresh a session’s persisted skills snapshot after a gateway restart. It introduces a restart-detection clause: if the in-memory snapshot version resets to `0` (fresh process) but the session already has a persisted snapshot with `version > 0` from a prior process lifetime, the snapshot is rebuilt instead of reused indefinitely.
It also adds a focused Vitest suite covering:
- restart scenario (memory version 0 + persisted > 0 → rebuild)
- normal reuse when versions match
- rebuild when watcher bumps in-memory version above persisted
- edge case when `sessionEntry` is missing but `sessionStore` contains the stale snapshot
- fresh snapshot build when no prior snapshot exists.
These changes fit into the existing skills refresh model where `getSkillsSnapshotVersion()` is maintained in-memory (and resets on restart), while session snapshots are persisted in the session store.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk.
- The change is narrowly scoped to the snapshot refresh decision logic, aligns with the described restart root cause, and is covered by targeted tests for restart, normal reuse, watcher refresh, and missing-sessionEntry edge cases. No additional call sites were affected beyond `ensureSkillSnapshot`, and the new logic deterministically rebuilds only when the persisted snapshot is known-stale relative to the process lifetime or version bump.
- No files require special attention
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#22568: fix(gateway): bump skills snapshot version on startup so sessions r...
by zwffff · 2026-02-21
87.2%
#16654: fix: refresh skills snapshot when managed skills change
by PhineasFleabottom · 2026-02-15
87.0%
#22525: [Bug]: Session snapshot not reloading skills after gateway restart ...
by zwffff · 2026-02-21
84.8%
#20533: fix: strip resolvedSkills from session store to prevent snapshot bloat
by echoVic · 2026-02-19
81.6%
#21883: fix: guard against undefined snapshot.skills in applySkillEnvOverride…
by felipedamacenoteodoro · 2026-02-20
80.4%
#13412: fix(sessions): refresh allowAgents permissions after gateway restart
by arun-dev-des · 2026-02-10
80.0%
#9221: fix(skills): use skillKey for env config lookup in snapshots
by gavinbmoore · 2026-02-05
79.9%
#11250: fix: expand skills watcher ignore list and improve session repair l...
by zhangzhefang-github · 2026-02-07
79.3%
#14023: fix: filter skills watcher to relevant file types to prevent FD exh...
by funmerlin · 2026-02-11
76.1%
#21521: fix: re-resolve skill paths at runtime for cross-machine portability
by mmaghsoodnia · 2026-02-20
75.7%