← Back to PRs

#12983: fix(gateway): defer seq increment until after dropIfSlow filtering

by hclsys open 2026-02-10 01:10 View on GitHub →
gateway stale
## Problem Dashboard shows recurring "event gap detected" errors because the broadcast `seq` counter increments before the per-client send loop. When `dropIfSlow` skips slow clients, the seq is consumed but the frame is never delivered, causing clients to see non-contiguous sequence numbers. Fixes #12895 ## Solution Restructure `broadcastInternal()` into a two-pass approach: 1. **Pass 1**: Collect eligible recipients, closing slow consumers along the way 2. **Pass 2**: Only increment `seq` and serialize the frame if at least one client will receive it The log metadata now distinguishes `"targeted"` (targeted events) from `"dropped"` (broadcast with no eligible recipients) for better observability. ## Testing - [ ] `pnpm build` passes - [ ] `pnpm check` passes - [ ] Added test: dropIfSlow does not increment seq when all clients are slow - [ ] Added test: mixed fast/slow clients receive contiguous seq <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR changes the gateway WebSocket broadcaster to avoid consuming a broadcast sequence number when no clients will actually receive the frame (e.g., when all recipients are filtered out by `dropIfSlow`). It does this by splitting `broadcastInternal()` into two passes: first selecting eligible recipients (and filtering/handling slow consumers), then allocating `seq` and serializing the frame only if at least one recipient exists. Tests were added to assert that `seq` is not incremented when all clients are slow with `dropIfSlow`, and that fast clients still observe contiguous `seq` values when mixed with slow clients. <h3>Confidence Score: 3/5</h3> - This PR is close to safe to merge, but there is a behavior change around handling slow clients when dropIfSlow is enabled that should be confirmed/fixed before merging. - The seq allocation change and added tests align with the stated bug. However, the new first-pass logic skips slow clients when `dropIfSlow` is true without closing them, whereas previously slow consumers were closed; if the prior behavior was relied on for backpressure enforcement, this is a functional regression. Tooling/tests were not runnable in this environment (pnpm unavailable), so confidence is reduced. - src/gateway/server-broadcast.ts <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs