#22480: fix: memory leak, silent WS failures, and connection error handling
gateway
size: XS
Cluster:
Memory Leak Fixes and Cleanup
## Summary
- **Fix memory leak in `seqByRun` map** (`src/infra/agent-events.ts`): The `seqByRun` map grew unboundedly with every agent run but was never cleaned up. Now `clearAgentRunContext()` and `resetAgentRunContextForTest()` also clean up the corresponding `seqByRun` entries, preventing memory exhaustion in long-running gateway instances.
- **Log WebSocket send failures** (`src/gateway/server/ws-connection.ts`): The `send()` function was silently swallowing all errors with an empty catch block. Now it logs a warning with connection ID and error details, and records `"send-error"` as the close cause for diagnostics.
- **Handle all socket errors, not just the first** (`src/gateway/server/ws-connection.ts`): Changed `socket.once("error")` to `socket.on("error")` so subsequent socket errors are also caught and trigger connection cleanup. The `close()` function is already idempotent (guards with `if (closed) return`), so repeated calls are safe.
- **Fix voice transcript deduplication cleanup** (`src/gateway/server-node-events.ts`): The cleanup loop had an early `break` that stopped pruning expired entries as soon as the map dropped below `MAX_RECENT_VOICE_TRANSCRIPTS`, leaving stale entries behind. Removed the early break so all expired entries are pruned before falling through to the eviction-by-insertion-order loop.
## Test plan
- [x] `vitest run src/infra/agent-events` — 3/3 tests pass
- [x] `vitest run src/gateway/gateway-misc.test.ts src/gateway/client.test.ts src/gateway/chat-sanitize.test.ts` — 31/31 tests pass
- [x] `tsc --noEmit` — no new type errors
- [ ] Verify gateway stability under sustained load (memory usage should plateau instead of growing)
- [ ] Verify WebSocket send failures now appear in logs with `send failed conn=` prefix
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Fixes three critical infrastructure issues in the gateway WebSocket handling and agent event management:
- **Memory leak in `seqByRun` map**: The map tracked sequence numbers for all agent runs but was never cleaned up. Now properly deleted in both `clearAgentRunContext()` and `resetAgentRunContextForTest()`, preventing unbounded memory growth in long-running gateway instances.
- **Silent WebSocket send failures**: Previously swallowed all errors with an empty catch block, making debugging impossible. Now logs warnings with connection ID and error details, and records `"send-error"` as close cause for diagnostics.
- **Incomplete socket error handling**: Changed from `socket.once("error")` to `socket.on("error")` to handle all socket errors, not just the first one. The existing `close()` function is already idempotent (guards with `if (closed) return`), so repeated calls are safe.
- **Voice transcript deduplication cleanup bug**: The cleanup loop had an early `break` that stopped pruning as soon as the map size dropped below `MAX_RECENT_VOICE_TRANSCRIPTS`, leaving expired entries behind. Removing the early break ensures all expired entries are pruned before falling through to the eviction-by-insertion-order loop.
All changes are well-tested and follow the repository's coding conventions. The fixes address real production issues that would cause memory leaks and silent failures.
<h3>Confidence Score: 5/5</h3>
- Safe to merge with no risk - fixes critical production bugs
- All changes are simple, well-reasoned bug fixes with clear benefits. The memory leak fix properly mirrors the cleanup pattern used for `runContextById`. The WebSocket error handling improvements add observability without changing behavior. The voice transcript cleanup fix removes a logic error. Tests pass and the changes are minimal and focused.
- No files require special attention
<sub>Last reviewed commit: 64a049c</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#22131: fix: clear seqByRun entries in clearAgentRunContext to prevent memo...
by alanwilhelm · 2026-02-20
87.5%
#18029: infra: fix memory leak and error handling in event listeners
by MAhmadUzair · 2026-02-16
79.6%
#17823: fix: memory leak in cron isolated runs — agent-events Maps never cl...
by techboss · 2026-02-16
78.6%
#22143: Fix memory leak in WhatsApp channel reconnection loop
by lancejames221b · 2026-02-20
78.6%
#10273: fix(agents): detect and auto-compact mid-run context overflow
by terryops · 2026-02-06
78.2%
#16949: fix(gateway): deliver chat:final even when sessionKey is unresolved (…
by ekleziast · 2026-02-15
78.0%
#20431: fix(sessions): add session contamination guards and self-leak lock ...
by marcomarandiz · 2026-02-18
77.2%
#8713: feat: gateway memory monitor, install linger, docs and failover
by quratus · 2026-02-04
76.7%
#19328: Fix: preserve modelOverride in agent handler (#5369)
by CodeReclaimers · 2026-02-17
76.1%
#21463: fix(discord): prevent WebSocket death spiral + fix numeric channel ID…
by akropp · 2026-02-20
76.0%