← Back to PRs

#21463: fix(discord): prevent WebSocket death spiral + fix numeric channel ID…

by akropp open 2026-02-20 00:02 View on GitHub →
channel: discord size: XS
… resolution Two bugs: 1. Message handler awaited processDiscordMessage inline, blocking the Discord event listener. Slow agent responses (30-150s) prevented WebSocket heartbeat servicing, causing code 1005/1006 disconnects and reconnect loops. Changed to fire-and-forget with error catching. 2. Channel resolver compared numeric channel IDs against channel names when config used guildId/channelId format (e.g. '123/456'). The second segment was treated as a name and slug-matched, which never matched numeric IDs. Now matches by ID when the channel query is numeric. ## Summary Describe the problem and fix in 2–5 bullets: - Problem: - Why it matters: - What changed: - What did NOT change (scope boundary): ## Change Type (select all) - [ ] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes # - Related # ## User-visible / Behavior Changes List user-visible changes (including defaults/config). If none, write `None`. ## Security Impact (required) - New permissions/capabilities? (`Yes/No`) - Secrets/tokens handling changed? (`Yes/No`) - New/changed network calls? (`Yes/No`) - Command/tool execution surface changed? (`Yes/No`) - Data access scope changed? (`Yes/No`) - If any `Yes`, explain risk + mitigation: ## Repro + Verification ### Environment - OS: - Runtime/container: - Model/provider: - Integration/channel (if any): - Relevant config (redacted): ### Steps 1. 2. 3. ### Expected - ### Actual - ## Evidence Attach at least one: - [ ] Failing test/log before + passing after - [ ] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) ## Human Verification (required) What you personally verified (not just CI), and how: - Verified scenarios: - Edge cases checked: - What you did **not** verify: ## Compatibility / Migration - Backward compatible? (`Yes/No`) - Config/env changes? (`Yes/No`) - Migration needed? (`Yes/No`) - If yes, exact upgrade steps: ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: - Files/config to restore: - Known bad symptoms reviewers should watch for: ## Risks and Mitigations List only real risks for this PR. Add/remove entries as needed. If none, write `None`. - Risk: - Mitigation: <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR fixes two critical Discord integration bugs: **Changes:** - **WebSocket heartbeat fix** (`message-handler.ts`): Changed `processDiscordMessage` from blocking (`await`) to fire-and-forget (`void` + `.catch()`) to prevent slow agent responses (30-150s) from blocking the Discord event listener and causing WebSocket heartbeat failures (code 1005/1006 disconnects) - **Numeric channel ID resolution** (`resolve-channels.ts`): Added numeric ID detection (`/^\d+$/`) to match channels by ID when using `guildId/channelId` format (e.g., `123/456`), instead of incorrectly treating the numeric channel ID as a name and attempting slug-based matching **Impact:** These are well-targeted fixes that address production stability issues. The fire-and-forget pattern is correct for this use case—errors are still caught and logged, but the event loop remains responsive. The numeric ID fix resolves a logic error where numeric channel IDs in the second segment of `guildId/channelId` patterns were being compared against channel names instead of channel IDs. <h3>Confidence Score: 4/5</h3> - This PR is safe to merge with low risk - Both fixes are well-scoped and address clear bugs with minimal surface area. The fire-and-forget pattern correctly prevents blocking while maintaining error logging, and the numeric ID matching fix is a straightforward logic correction. Score is 4 (not 5) because the fire-and-forget change alters concurrency behavior in a production-critical path, though the change is sound and necessary. - No files require special attention <sub>Last reviewed commit: c396ee3</sub> <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs