← Back to PRs

#20688: fix(browser): allow extension reconnect when stale websocket lingers (AI-assisted)

by HOYALIM open 2026-02-19 06:48 View on GitHub →
size: S
## Summary - Problem: `/extension` WebSocket upgrade returned `409 Extension already connected` whenever `extensionWs` was non-null, including reconnect races where the old socket was already `CLOSING`/`CLOSED`. - Why it matters: Chrome MV3 worker reconnects can fail with `Unexpected response code: 409`, forcing manual re-attach and reducing browser relay reliability. - What changed: - `/extension` now returns 409 only when the existing extension socket is actually `OPEN`. - If the socket is stale (`CLOSING`/`CLOSED`), relay terminates/clears it and accepts the new connection. - Added a close-handler identity guard so late close events from replaced sockets do not wipe active state. - Added regression tests for both: - second live extension connection still returns 409 - immediate reconnect while prior socket is closing succeeds - What did NOT change (scope boundary): - No multi-extension support. - No auth/token behavior change. - No API contract or config/default changes. ## AI-assisted Disclosure - AI-assisted: Yes (Codex) - Testing degree: Fully tested for project CI gate (`pnpm build`, `pnpm check`, `pnpm test`) - I reviewed and understand the code changes and test coverage. - Session logs/prompts: available on request. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Related #15099 - Related #6175 - Related #18698 ## User-visible / Behavior Changes - Reconnect reliability is improved for Chrome extension relay in stale-socket races. - Users should see fewer `Unexpected response code: 409` reconnect failures. - True concurrent second live extension connections are still rejected with 409 (unchanged policy). ## Security Impact (required) - New permissions/capabilities? (`Yes/No`): No - Secrets/tokens handling changed? (`Yes/No`): No - New/changed network calls? (`Yes/No`): No - Command/tool execution surface changed? (`Yes/No`): No - Data access scope changed? (`Yes/No`): No - If any `Yes`, explain risk + mitigation: N/A ## Repro + Verification ### Environment - OS: macOS (arm64) - Runtime/container: Node v24.7.0, pnpm v10.30.0 - Model/provider: N/A - Integration/channel (if any): Chrome extension relay (`/extension`, `/cdp`) - Relevant config (redacted): default relay loopback (`127.0.0.1:18792`) ### Steps 1. Connect one extension WebSocket to `/extension`. 2. Close it and immediately connect again. 3. Compare behavior before/after fix. ### Expected - Second truly live concurrent connection should return 409. - Reconnect while old socket is stale/closing should succeed. ### Actual - Before fix: reconnect race could fail with `Unexpected server response: 409`. - After fix: reconnect race succeeds; live-duplicate policy remains 409. ## Evidence - [x] Failing test/log before + passing after - [x] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) Before: - `src/browser/extension-relay.test.ts > allows immediate reconnect when prior extension socket is closing` - Error: `Unexpected server response: 409` After: - `pnpm vitest run --config vitest.unit.config.ts src/browser/extension-relay.test.ts` (9/9 passing) - `pnpm build` (pass) - `pnpm check` (pass) - `pnpm test` (pass: 736 files, 6126 tests) ## Human Verification (required) - Verified scenarios: - stale reconnect race now succeeds - second live extension connection still returns 409 - full test gates pass (`build/check/test`) - Edge cases checked: - late close event from replaced socket does not clear active state - unchanged duplicate-live-connection behavior - What you did **not** verify: - Manual cross-browser UX validation across all OS combinations ## Compatibility / Migration - Backward compatible? (`Yes/No`): Yes - Config/env changes? (`Yes/No`): No - Migration needed? (`Yes/No`): No - If yes, exact upgrade steps: N/A ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: - Revert commit `cb37628e1`. - Files/config to restore: - `src/browser/extension-relay.ts` - `src/browser/extension-relay.test.ts` - Known bad symptoms reviewers should watch for: - unexpected acceptance of a second truly live extension connection - active connection state being wiped by a stale close event ## Risks and Mitigations - Risk: stale-socket cleanup could incorrectly replace an active socket. - Mitigation: replacement only occurs when `readyState !== OPEN`; explicit regression test preserves live-duplicate 409 behavior. - Risk: late close events from prior socket could clear newer state. - Mitigation: close-handler identity guard (`extensionWs !== ws`) + regression coverage. <!-- greptile_comment --> <h3>Greptile Summary</h3> Fixed WebSocket reconnection race condition in Chrome extension relay by allowing reconnects when the prior socket is stale (`CLOSING`/`CLOSED`) rather than rejecting all reconnect attempts. The fix includes an identity guard in the close handler to prevent late close events from cleared sockets from wiping active connections. - Changed `/extension` endpoint to only reject with 409 when `extensionWs.readyState === WebSocket.OPEN` - Added cleanup logic to terminate and clear stale sockets before accepting new connections - Added identity check (`extensionWs !== ws`) in close handler to prevent replaced sockets from clearing active state - Added regression tests for both duplicate live connection rejection and successful stale-socket reconnection <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with no identified issues - The fix correctly addresses a reconnection race condition with minimal, well-targeted changes. The implementation properly checks `readyState` before rejecting connections, safely terminates stale sockets, and includes an identity guard to prevent state corruption from late close events. Both positive (stale reconnect) and negative (duplicate live connection) test cases are present, and all test gates pass. - No files require special attention <sub>Last reviewed commit: cb37628</sub> <!-- greptile_other_comments_section --> <sub>(4/5) You can add custom instructions or style guidelines for the agent [here](https://app.greptile.com/review/github)!</sub> <!-- /greptile_comment -->

Most Similar PRs