#17588: fix(relay): survive WS disconnects and MV3 worker restarts
size: M
Cluster:
Chrome Extension Enhancements
## Summary
The Browser Relay extension frequently loses connection and fails to recover, requiring manual re-attach. This PR adds 5 fixes to make the relay resilient to WebSocket drops, MV3 service worker termination, and tab lifecycle events.
Related issues: #15099, #15989, #11142
## Problem
The current `background.js` has several failure modes:
1. **WebSocket drop → debugger detach**: When the relay WebSocket closes (e.g., OpenClaw restart, network blip), `onRelayClosed()` immediately detaches all debugger sessions and clears state. The user must manually click the toolbar icon again to re-attach.
2. **No auto-reconnect**: After a WS disconnect, the extension makes no attempt to reconnect. It silently goes dead.
3. **MV3 service worker death**: Chrome/Edge can terminate the service worker at any time. All in-memory state (tabs, sessions) is lost.
4. **Tab close/replace not handled**: Stale entries remain in the maps.
5. **Service worker termination**: Without periodic activity, Chrome terminates the MV3 service worker after ~30s of inactivity.
## Fixes
| # | Fix | Description |
|---|-----|-------------|
| 1 | **Don't detach on WS drop** | Keep debugger attached, show reconnecting badge, re-announce on reconnect |
| 2 | **Auto-reconnect** | Exponential backoff (1s → 30s cap) with jitter, re-announce all tabs on success |
| 3 | **Session storage persistence** | Save/restore state via `chrome.storage.session` for MV3 worker restarts, with debugger re-attach |
| 4 | **Tab lifecycle cleanup** | `onRemoved` / `onReplaced` listeners to clean up stale entries |
| 5 | **Keepalive alarm** | `chrome.alarms` every 4 min to prevent worker termination + detect stale debuggers |
### manifest.json
- Added `"alarms"` to permissions array (required for Fix #5)
## Testing
Tested on Edge (Chromium-based) with an automated pipeline running via cron at 3:00 AM and 3:00 PM daily:
- **12+ hours continuous uptime** without disconnects
- Survived multiple OpenClaw gateway restarts (auto-reconnect worked)
- Survived Edge Sleeping Tabs (keepalive prevented worker termination)
- Cron jobs successfully connected to the relay without manual intervention
- Badge correctly shows ⏳ during reconnect, ✅ on recovery
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR makes the Browser Relay extension resilient to WebSocket disconnects, MV3 service worker termination, and tab lifecycle events by adding five fixes: keeping debugger sessions alive across WS drops, auto-reconnect with exponential backoff, session storage persistence, tab lifecycle cleanup, and a keepalive alarm.
- **Fix #1 (Don't detach on WS drop)**: `onRelayClosed` no longer tears down debugger sessions — it updates badges and preserves in-memory state for reconnection.
- **Fix #2 (Auto-reconnect)**: `scheduleReconnect` uses exponential backoff (1s–30s cap) with jitter. On success, it re-validates and re-announces all tabs with thorough per-tab error handling.
- **Fix #3 (Session storage persistence)**: State is persisted to `chrome.storage.session` after attach/detach and restored on service worker startup, including debugger re-attach if needed.
- **Fix #4 (Tab lifecycle cleanup)**: `onRemoved` and `onReplaced` listeners clean up stale entries.
- **Fix #5 (Keepalive alarm)**: 4-minute `chrome.alarms` interval pings debuggers and triggers state restore if a debugger is lost.
- **Issue**: The `restoreState` re-announce loop (lines 313–330) lacks per-tab error handling — a single tab failure aborts re-announcement for all remaining tabs, unlike the analogous loop in `scheduleReconnect`.
- **Note**: `childSessionToTab` is not persisted, so child sessions (iframes, service workers) will be lost across service worker restarts. New child session events will re-populate the map, but the relay server may hold stale references in the interim.
<h3>Confidence Score: 3/5</h3>
- This PR introduces significant resilience improvements but has several race conditions and error-handling gaps that could cause unexpected behavior under concurrent failure scenarios.
- The core logic is well-structured and addresses real failure modes effectively. However, the re-announce loop in restoreState lacks per-tab error handling (new finding), and previous review threads identified race conditions in onRelayClosed (socket identity), reconnect guard disarming, and keepalive-triggered restoreState clobbering — all of which remain unresolved. The manifest change is minimal and correct.
- `assets/chrome-extension/background.js` — the `restoreState` function (lines 310–334) and the race conditions noted in previous review threads
<sub>Last reviewed commit: 7ab237e</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#15817: fix(chrome-relay): auto-reconnect, MV3 persistence, and keepalive
by derrickburns · 2026-02-13
90.8%
#10390: fix(chrome-relay): sticky attach + auto-restore after disconnects
by Mokoby · 2026-02-06
85.2%
#19766: fix: Chrome relay extension auto-reattach after SPA navigation
by nishantkabra77 · 2026-02-18
84.6%
#16743: fix: auto-reattach browser relay debugger after navigation
by jg-noncelogic · 2026-02-15
83.4%
#20688: fix(browser): allow extension reconnect when stale websocket linger...
by HOYALIM · 2026-02-19
81.6%
#22571: fix(browser): complete extension relay handshake on connect.challenge
by pandego · 2026-02-21
80.2%
#21314: feat: enhance browser relay with custom naming and diagnostic tools
by kelvinCB · 2026-02-19
79.0%
#22385: fix: improve delivery recovery logging with entry age and deferral ...
by derrickburns · 2026-02-21
78.6%
#13942: Fix: chrome-extension: make relay reconnect reliable with one tab
by gabepsilva · 2026-02-11
78.4%
#16689: browser: support multiple Chrome extension connections to relay
by globalcaos · 2026-02-15
77.4%