#9879: Improve gateway probe diagnostics for slow channels
gateway
commands
stale
Cluster:
Gateway and TLS Enhancements
When a slow/buggy channel (e.g. iMessage) blocks the health check, the probe would fail with a generic timeout and no indication of which channel was responsible.
This change surfaces which channel is causing the timeout so users can identify and disable it in config.
**Changes:**
- Track current channel being probed in health state
- Add `health.probeStatus` RPC to report active probe channel
- On probe timeout, include channel name in error message (e.g. `timeout (health check blocked by channel: imessage; try disabling it in config)`)
- Skip generic timeout hint when error already names the channel
Made with [Cursor](https://cursor.com)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR improves gateway probe diagnostics when health checks hang on slow/buggy channels by (1) tracking the currently-probed channel in health state, (2) exposing a new read-only RPC (`health.probeStatus`) to report that channel, and (3) enriching probe timeout errors (and CLI output) with the channel name when available. It also tweaks health handling to avoid blocking non-probe callers while a refresh is in progress.
The changes touch both the CLI-side probing (`src/gateway/probe.ts`, `src/commands/gateway-status/helpers.ts`) and gateway server request handlers/context (`src/gateway/server-methods/*`, `src/gateway/server.impl.ts`, `src/gateway/server/health-state.ts`).
<h3>Confidence Score: 3/5</h3>
- This PR is directionally sound but has at least one correctness issue that will prevent the new diagnostics from working as intended.
- The probe-status channel tracking appears to be set from the CLI-side health snapshot code rather than the gateway server’s health refresh path, so `health.probeStatus` may not reflect real server-side probe activity. Additionally, the new async timeout handler in the probe can keep running after the probe settles, risking duplicate resolution/work.
- src/commands/health.ts, src/gateway/probe.ts
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#6302: fix: Add timeouts to prevent indefinite hangs (issues #4954, #4956,...
by batumilove · 2026-02-01
75.4%
#22716: fix: gateway status probe uses wss:// when TLS enabled; accept self...
by Fratua · 2026-02-21
75.2%
#10123: fix: guard deep health probe against unreachable gateway (#9091)
by petter-b · 2026-02-06
75.1%
#6466: fix(gateway): add handshake timeout and connection error handling
by jarvis-raven · 2026-02-01
74.6%
#22682: fix(gateway): [P0] status probe ignores gateway.tls.enabled — hardc...
by mahsumaktas · 2026-02-21
73.3%
#19437: Gateway: respect custom bind host for local health/RPC target resol...
by frudas24 · 2026-02-17
73.2%
#22355: fix(gateway): add exponential backoff to remote node bin probes
by xinhuagu · 2026-02-21
72.8%
#8260: fix(macOS): gateway readiness detection + reversible Configure later
by xksteven · 2026-02-03
72.3%
#8713: feat: gateway memory monitor, install linger, docs and failover
by quratus · 2026-02-04
72.2%
#11478: Chore: add Dockerfile HEALTHCHECK and debug-log silent catch blocks
by U-C4N · 2026-02-07
71.9%