#19255: feat(gateway): add WebSocket connection metrics monitoring
gateway
size: M
Cluster:
Session Management and Fixes
## Summary
Describe the problem and fix in 2–5 bullets:
- Problem:
- Why it matters:
- What changed:
- What did NOT change (scope boundary):
## Change Type (select all)
- [ ] Bug fix
- [ ] Feature
- [ ] Refactor
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [ ] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [ ] Memory / storage
- [ ] Integrations
- [ ] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Closes #
- Related #
## User-visible / Behavior Changes
List user-visible changes (including defaults/config).
If none, write `None`.
## Security Impact (required)
- New permissions/capabilities? (`Yes/No`)
- Secrets/tokens handling changed? (`Yes/No`)
- New/changed network calls? (`Yes/No`)
- Command/tool execution surface changed? (`Yes/No`)
- Data access scope changed? (`Yes/No`)
- If any `Yes`, explain risk + mitigation:
## Repro + Verification
### Environment
- OS:
- Runtime/container:
- Model/provider:
- Integration/channel (if any):
- Relevant config (redacted):
### Steps
1.
2.
3.
### Expected
-
### Actual
-
## Evidence
Attach at least one:
- [ ] Failing test/log before + passing after
- [ ] Trace/log snippets
- [ ] Screenshot/recording
- [ ] Perf numbers (if relevant)
## Human Verification (required)
What you personally verified (not just CI), and how:
- Verified scenarios:
- Edge cases checked:
- What you did **not** verify:
## Compatibility / Migration
- Backward compatible? (`Yes/No`)
- Config/env changes? (`Yes/No`)
- Migration needed? (`Yes/No`)
- If yes, exact upgrade steps:
## Failure Recovery (if this breaks)
- How to disable/revert this change quickly:
- Files/config to restore:
- Known bad symptoms reviewers should watch for:
## Risks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write `None`.
- Risk:
- Mitigation:
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR adds WebSocket connection metrics monitoring to the gateway. It introduces a `WsMetricsCollector` singleton class that tracks connection lifecycle events (connect, handshake, disconnect), message counts, byte throughput, and per-client stats. Two new gateway methods are exposed: `ws.metrics` (read-scoped, available to operators) and `ws.clients` (admin-only, returns detailed per-client stats).
- New `WsMetricsCollector` class in `src/gateway/server/ws-metrics.ts` with capped rolling window for average connection duration (last 1000) and EMA-based latency tracking
- `ws.metrics` added to `READ_METHODS` — accessible to any operator with `operator.read` scope
- `ws.clients` relies on the catch-all admin fallthrough in `authorizeGatewayMethod` rather than being explicitly listed in an authorization set — functionally correct but fragile
- Pre-handshake messages (e.g., `connect.challenge`) are counted in global `messagesSent` / `bytesSent` but not in per-client stats, creating a minor data inconsistency
- The `updateLatency` method is defined on the collector but not called anywhere in this PR
- No tests are included for the new metrics collector or handlers
<h3>Confidence Score: 3/5</h3>
- This PR is likely safe to merge — it adds read-only observability instrumentation with no changes to existing business logic — but has minor data consistency concerns and no test coverage.
- The implementation is structurally sound and the metrics collector is well-designed with appropriate caps. However: (1) no tests are included for a feature that touches the hot path of every WebSocket message, (2) pre-handshake message counting creates a global-vs-per-client data inconsistency, and (3) the `ws.clients` authorization relies on an implicit fallthrough rather than explicit registration. The PR description is also entirely empty, making it harder to assess intent and scope.
- `src/gateway/server/ws-connection.ts` (pre-handshake metrics accounting), `src/gateway/server-methods/ws-metrics.ts` (authorization approach for `ws.clients`)
<sub>Last reviewed commit: 2f06e27</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#14993: fix(webchat): add heartbeat detection to prevent zombie WebSocket c...
by BenediktSchackenberg · 2026-02-12
76.0%
#12999: feat(agents): Add streaming response metrics tracking
by trevorgordon981 · 2026-02-10
75.2%
#19104: test(discord): improve gateway logging test coverage
by Clawborn · 2026-02-17
75.1%
#22926: feat(gateway): add Windows-native watch DX and tool/channel observa...
by Kansodata · 2026-02-21
74.9%
#23714: Gateway: add websocket ingress limits for DoS hardening
by bmendonca3 · 2026-02-22
74.7%
#8522: feat(control-ui): Add Model Requests panel for real-time API monito...
by GiantAxeWhy · 2026-02-04
74.2%
#19515: security: add per-connection WebSocket rate limiting
by Mozzzaic · 2026-02-17
74.2%
#23420: Gateway: tighten WS connect schema bounds and validation
by bmendonca3 · 2026-02-22
74.1%
#6302: fix: Add timeouts to prevent indefinite hangs (issues #4954, #4956,...
by batumilove · 2026-02-01
73.6%
#6466: fix(gateway): add handshake timeout and connection error handling
by jarvis-raven · 2026-02-01
73.6%