← Back to PRs

#20967: fix(discord): report connected state so health-monitor can restart stuck accounts

by who96 open 2026-02-19 14:04 View on GitHub →
channel: discord size: XS
## Problem The Discord channel health monitor never triggers an automatic restart for accounts stuck in a reconnect death spiral. **Root cause:** `isChannelHealthy()` in `src/gateway/channel-health-monitor.ts` checks `snapshot.connected === false` (strict equality). However, `monitorDiscordProvider` in `src/discord/monitor/provider.ts` never calls `opts.setStatus(...)`, so `snapshot.connected` is always `undefined`. Because `undefined !== false`, the health check always returns `true` — the account appears healthy regardless of how many reconnect attempts have failed. **Observed symptom:** Discord accounts can enter a continuous 1005/1006 disconnect loop (dozens of `Attempting resume` log lines per minute) while the health monitor sits idle. The process does not crash, so `KeepAlive` in launchd/systemd never triggers a restart either. The only recovery path is a manual restart. **Code reference:** `isChannelHealthy()` at `src/gateway/channel-health-monitor.ts` — the `connected === false` guard is correct logic, but it is only reachable if a channel plugin actually reports state via `setStatus`. Discord does not. ## Fix 1. Add an optional `setStatus` callback to `MonitorDiscordOpts`. 2. Call `setStatus({ accountId, connected: true })` immediately after the `"logged in to discord"` log line — the first point where we know the WebSocket handshake succeeded. 3. Call `setStatus({ accountId, connected: false })` at the top of the `finally` block so that every exit path (abort signal, max reconnect attempts reached, fatal gateway error) marks the account as disconnected before cleanup runs. 4. Thread `setStatus` from `ChannelGatewayContext` into `monitorDiscordProvider` in `extensions/discord/src/channel.ts`. ## Changes | File | Change | |------|--------| | `src/discord/monitor/provider.ts` | Add `setStatus?` to `MonitorDiscordOpts`; call it on connect and in `finally` | | `extensions/discord/src/channel.ts` | Pass `setStatus: (patch) => ctx.setStatus(patch)` to `monitorDiscordProvider` | ## Diff size 4 lines added, 0 deleted. ## Testing After this fix: - A Discord account that successfully logs in will have `connected: true` in its runtime snapshot. - Any exit from `monitorDiscordProvider` (graceful or otherwise) will set `connected: false`. - `isChannelHealthy()` will return `false` for accounts that are `running` but `connected === false`, allowing the health monitor to trigger a restart within one check interval (default: 5 minutes). Other channel plugins (`zalo`, `irc`, etc.) already call `setStatus` with a `connected` field; this brings the Discord plugin into parity with them. <!-- greptile_comment --> <h3>Greptile Summary</h3> Adds connection state tracking to Discord channel provider by reporting `connected: true` after successful login and `connected: false` in the finally block. This enables the channel health monitor to detect stuck Discord accounts (reconnect death spirals with 1005/1006 errors) and trigger automatic restarts. The implementation follows the established pattern used by other channel providers (e.g., Mattermost at `extensions/mattermost/src/mattermost/monitor-websocket.ts:131-183`). The `connected: false` status is set in the finally block at `src/discord/monitor/provider.ts:668`, ensuring all exit paths (abort signal, max reconnects, fatal errors) properly mark the account as disconnected before cleanup. Previously, Discord never called `setStatus`, so `snapshot.connected` remained `undefined`. Since `isChannelHealthy()` in `src/gateway/channel-health-monitor.ts:47` checks `snapshot.connected === false` with strict equality, `undefined !== false` caused the health check to always return true, preventing automatic recovery of stuck accounts. <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk - The fix is a simple 4-line addition that correctly implements connection state tracking for Discord accounts. It follows the established pattern used by other channel providers (e.g., Mattermost), places the connected:false call in the finally block to ensure all exit paths are covered, and directly addresses the root cause identified in the PR description. No logic changes to existing code, only adds the missing status reporting. - No files require special attention <sub>Last reviewed commit: da40e5e</sub> <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs