← Back to PRs

#16196: fix(gateway): add periodic cleanup to prevent memory leak in ToolEventRecipientRegistry

by bianbiandashen open 2026-02-14 13:02 View on GitHub →
gateway stale size: XS
## Summary Fix a potential memory leak in `ToolEventRecipientRegistry` by adding periodic cleanup of stale entries. ## Problem The registry tracks WebSocket connections interested in tool events for specific agent runs. Each entry has a TTL of 10 minutes (or 30 seconds after finalization). **Current behavior**: Stale entries are only cleaned up during `add()`, `get()`, or `markFinal()` operations. **Issue**: If no new tool events arrive after agent runs complete, expired entries remain in memory indefinitely. In long-running gateway servers with intermittent activity, this could lead to unbounded memory growth. ## Solution Add a periodic cleanup timer that runs every 60 seconds: ```typescript const pruneTimer = setInterval(prune, TOOL_EVENT_PRUNE_INTERVAL_MS); // Allow process exit without waiting for timer if (pruneTimer.unref) { pruneTimer.unref(); } ``` Also adds a `dispose()` method to the registry interface for proper cleanup during server shutdown. ## Why 60 seconds? - Much shorter than the 10-minute TTL, ensuring timely cleanup - Long enough to avoid excessive CPU overhead - Aligns with typical observability/metrics intervals ## API Change The `ToolEventRecipientRegistry` type now includes a `dispose()` method: ```typescript export type ToolEventRecipientRegistry = { add: (runId: string, connId: string) => void; get: (runId: string) => ReadonlySet<string> | undefined; markFinal: (runId: string) => void; dispose: () => void; // NEW }; ``` Callers should call `dispose()` during graceful shutdown to clean up the timer. ## Test Plan - [x] Verify registry entries are cleaned up after TTL expires (during idle) - [x] Verify `dispose()` clears timer and entries - [x] Verify `unref()` allows clean process exit <!-- greptile_comment --> <h3>Greptile Summary</h3> Adds a periodic `setInterval` cleanup timer (every 60s) to `ToolEventRecipientRegistry` to prune stale entries during idle periods, preventing unbounded memory growth in long-running gateway servers. Also adds a `dispose()` method to the registry type for proper resource cleanup. - Adds `TOOL_EVENT_PRUNE_INTERVAL_MS` (60s) constant and a `setInterval(prune, ...)` call inside `createToolEventRecipientRegistry()` - Uses `pruneTimer.unref()` to allow clean process exit even if the timer is active - Adds `dispose()` to the `ToolEventRecipientRegistry` type, which clears the interval and the recipients map - The existing `prune()` logic (TTL-based expiry) is reused by the periodic timer — no new pruning semantics are introduced <h3>Confidence Score: 4/5</h3> - This PR is safe to merge — the change is additive, well-scoped, and the timer is properly unref'd to avoid blocking process exit. - The implementation is correct: it reuses the existing `prune()` function, properly guards `unref()`, and adds a clean `dispose()` method. Score is 4 rather than 5 because `dispose()` is not yet called during gateway shutdown (the timer is unref'd so it won't block exit, but cleanup is incomplete). This was noted in a previous review thread. - No files require special attention beyond the previously noted integration gap in `server-close.ts`. <sub>Last reviewed commit: c3f12b7</sub> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs