โ† Back to PRs

#22642: fix(discord): Discord status state machine 2.0 (clean restart, follow-up to #18248)

by victorGPT open 2026-02-21 12:55 View on GitHub โ†’
docs channel: discord size: XL
Supersedes #22563 (frozen draft for history). This is a clean restart from main with minimal diff. # PR Title (recommended) `fix(discord): Discord status state machine 2.0 (follow-up to #18248)` --- ## Summary Describe the problem and fix in 2โ€“5 bullets: - Problem: - v1 (authored by **@victorGPT**, merged as PR #18248) solved baseline transitions, but under stress it could still show **semantic/progress rollback** on the same message: eye emoji and active emojis could flip back and forth (`๐Ÿ‘€ โ†” ๐Ÿค”`, or `โณ -> ๐Ÿ’ป -> โณ`). - In bursty multi-message queues, v1 did not reliably provide a clear per-message waiting signal for every queued message. - Why it matters: - This creates false progress signals: operators can see a message appear to "go backward" even though execution already moved forward. - What changed: - v2 enforces semantic monotonic progression (no ACTIVE/WAITING rollback), keeps ACTIVE-internal icon switches valid, unifies deferred queue outcomes, adds explicit clear/cancel-before-start cleanup, and makes queued-message waiting visibility explicit for burst scenarios. - What did NOT change (scope boundary): - No non-Discord behavior changes, no thread/channel rename strategy changes, no new slash command in this PR. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [x] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [x] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #(fill-if-any) - Related #18248 (v1 baseline) - v1 PR (authored by @victorGPT): [#18248](https://github.com/openclaw/openclaw/pull/18248) ## User-visible / Behavior Changes - v2 explicitly prevents semantic/progress rollback once a message reaches ACTIVE (blocks `ACTIVE -> WAITING`). - v1-style thrash patterns such as `๐Ÿ‘€ โ†” ๐Ÿค”` and `โณ -> ๐Ÿ’ป -> โณ` are treated as invalid rollback and suppressed. - ACTIVE internal switching (thinking/tool variants like `๐Ÿค” โ†” ๐Ÿ’ป/๐Ÿ› ๏ธ/๐ŸŒ`) remains valid and is not treated as rollback. - In multi-message bursts, each queued message now shows explicit WAITING (`โณ`) while queued, then transitions when processing actually starts. - Off mode remains ack-oriented while handling deferred bookkeeping correctly (no premature ack-clear during queued/deferred lifecycle). - Deferred state cleanup is explicit on queue clear/cancel-before-start paths to avoid stale status artifacts. ### Before vs After (v1 -> v2) | Area | v1 (#18248) | v2 (this PR) | | ------------------------------------------ | ---------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | | Baseline behavior | Faster reaction status machine | Semantic hardening under real queue/deferred stress | | Semantic/progress rollback on same message | Could still thrash between eye + active emojis (`๐Ÿ‘€ โ†” ๐Ÿค”`, `โณ -> ๐Ÿ’ป -> โณ`) | Monotonic semantic phases block rollback; no false backward progress | | Deferred queue outcomes | Partial handling | Unified outcome model (`queued/skipped/dropped/merged/failed`) with cleanup hooks | | Multi-message queue visibility | Under burst load, some queued messages lacked clear waiting state | Every queued message shows WAITING (`โณ`), then transitions when execution starts | | Clear/cancel-before-start | Could leave stale deferred state | Explicit failure outcome + deferred cleanup | | off mode + deferred | Could lose deferred context and clear ack too early in edge cases | Keeps ack-only display while preserving deferred lifecycle bookkeeping | ### Behavior Signatures (quick reviewer view) - **V1 false rollback signature** (same message): `๐Ÿ‘€ -> ๐Ÿค” -> ๐Ÿ’ป -> ๐Ÿ‘€ -> ๐Ÿค”` or `โณ -> ๐Ÿ’ป -> โณ` - **V2 monotonic signature** (same message): WAITING `โณ` (if queued) -> ACTIVE (`๐Ÿค”/๐Ÿ’ป/๐Ÿ› ๏ธ/๐ŸŒ`) -> terminal (`โœ…/โŒ`) without semantic backward transition - **Queue visibility in bursts**: for A/B/C stacked messages, B and C show `โณ` while waiting; each one switches out of `โณ` when its own execution actually begins ## Security Impact (required) - New permissions/capabilities? (`No`) - Secrets/tokens handling changed? (`No`) - New/changed network calls? (`No`) - Command/tool execution surface changed? (`No`) - Data access scope changed? (`No`) - If any `Yes`, explain risk + mitigation: - N/A ## Repro + Verification ### Environment - OS: macOS (Darwin arm64) - Runtime/container: Node.js + pnpm local workspace - Model/provider: openai-codex/gpt-5.3-codex (coding/review workers) - Integration/channel (if any): Discord - Relevant config (redacted): - `messages.statusReactionMode = full|off` - `messages.removeAckAfterReply = true|false` ### Steps 1. Send multi-message burst to create queued/deferred transitions. 2. Observe status progression across waiting/active/terminal phases. 3. Test `statusReactionMode=off` with deferred queued flow and `removeAckAfterReply=true`. 4. Trigger clear/cancel-before-start path and ensure no stale deferred status remains. ### Expected - No semantic rollback after ACTIVE starts. - No `๐Ÿ‘€ โ†” ๐Ÿค”` oscillation and no `โณ -> ๐Ÿ’ป -> โณ` fallback for the same message. - In burst sends, queued messages are clearly signaled as waiting and then promoted when execution starts. - off mode remains ack-oriented without premature ack clear during deferred flow. - No stale deferred state after queue-clear/cancel-before-start. ### Actual - Matches expected in targeted regression tests and local verification commands. ## Evidence Attach at least one: - [x] Failing test/log before + passing after - [x] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) ## Human Verification (required) What you personally verified (not just CI), and how: - Verified scenarios: - queued/deferred lifecycle correctness, - rollback prevention invariants, - off-mode deferred behavior, - queue clear/cancel cleanup behavior. - Edge cases checked: - error/retry progression, - cancellation before lifecycle start. - What you did **not** verify: - full production-scale long-duration multi-guild soak run. ## Compatibility / Migration - Backward compatible? (`Yes`) - Config/env changes? (`No`) - Migration needed? (`No`) - If yes, exact upgrade steps: - N/A ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: - Roll back this PR commit range. - Files/config to restore: - `src/discord/monitor/message-handler.process.ts` - `src/discord/monitor/message-handler.process.test.ts` - `src/auto-reply/reply/queue/*` (outcome/cleanup related) - Known bad symptoms reviewers should watch for: - `๐Ÿ‘€ โ†” ๐Ÿค”` jitter on one message, - `โณ -> tool -> โณ` fallback, - ack disappearing too early in off-mode deferred runs, - stale deferred status after queue clear/cancel. ## Risks and Mitigations - Risk: - Deferred cleanup timing tradeoff in rare delayed-retry scenarios. - Mitigation: - Unified outcome cleanup + targeted lifecycle regression tests. - Risk: - Future call paths bypassing semantic invariants. - Mitigation: - Centralized semantic-phase guard in status controller + regression tests for observed live signatures. <!-- greptile_comment --> <h3>Greptile Summary</h3> Hardens Discord status reaction state machine to prevent semantic rollback (e.g., `๐Ÿ‘€ โ†” ๐Ÿค”` or `โณ -> ๐Ÿ’ป -> โณ` on same message) by enforcing monotonic phase progression and unifying deferred queue lifecycle tracking. ## Key Changes - Implements semantic phase gates (`queued` -> `waiting` -> `active` -> `terminal`) to block backward transitions in `src/discord/monitor/message-handler.process.ts:307-318` - Introduces deferred status controller registry with lifecycle bridge in `src/discord/monitor/message-handler.process.ts:115-244` to coordinate queue outcomes with status reactions - Adds `onFollowupQueued` callback plumbing through `GetReplyOptions` and queue system to signal queued/skipped/dropped/merged/failed outcomes - Establishes per-message waiting visibility for multi-message bursts by detecting active sessions and busy lanes at ingress (`src/discord/monitor/message-handler.process.ts:637-656`) - Preserves waiting hourglass (`โณ`) until lifecycle actually starts, avoiding premature clear for queued runs ## Test Coverage - Comprehensive regression tests for monotonic transitions, active-to-waiting rollback prevention, deferred lifecycle coordination, error-retry recovery, and queue outcome emissions - Validates `statusReactionMode=off` preserves ack-only behavior while maintaining deferred bookkeeping - Tests cover rate-limiting, debounce, stall timers, and cleanup edge cases ## Config Schema - Introduces `messages.statusReactionMode` (`full` | `off`) to control transition behavior - Preserves backward compatibility (defaults to `full`) - Documents new mode in `docs/channels/discord.md` <h3>Confidence Score: 4/5</h3> - Safe to merge with high confidence; well-tested monotonic state machine prevents observed rollback patterns - Implementation is solid with comprehensive test coverage for core invariants. The semantic phase transition logic is sound and properly gates rollback. Deferred lifecycle coordination is well-architected. Minor complexity around cleanup timing in edge cases (error-retry TTL) is acceptable given unified outcome model and targeted regression test...

Most Similar PRs