#21576: fix(feishu): stop ghost messages from corrupting active conversations on reconnect

by xuanyue202 open 2026-02-20 03:43 View on GitHub →

channel: feishu size: XS

## Summary - **Problem:** Dedup state is lost on process restart, causing old Feishu messages (within the 72-hour retransmission window) to be reprocessed after reconnection. - **Why it matters:** I think this is a pretty severe bug, as currently the feishu channel might process unexpected ghost message from time to time, causing very strange behavior, such as suddenly creating new session during the middle of a conversation. Feishu retransmits unacknowledged events for up to 72 hours after WebSocket reconnection. Without persistent dedup, a process restart followed by reconnection will replay all recent messages, causing duplicate processing and data corruption. - **What changed:** Extended dedup TTL from 30 minutes to 72 hours and added disk persistence (JSON file in `~/.openclaw/data/feishu-dedup.json`) with debounced writes and atomic file operations. - **What did NOT change:** The dedup logic itself, message acknowledgment flow, or Feishu API integration. ## Change Type - [x] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope - [ ] Gateway / orchestration - [x] Integrations - [x] Memory / storage - [ ] Auth / tokens - [ ] Skills / tool execution - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes (if applicable) - Related (if applicable) ## User-visible / Behavior Changes None. This is an internal reliability fix that prevents duplicate message processing after restarts. ## Security Impact - New permissions/capabilities? `No` - Secrets/tokens handling changed? `No` - New/changed network calls? `No` - Command/tool execution surface changed? `No` - Data access scope changed? `No` ## Repro + Verification ### Environment - OS: Any - Runtime: Node.js - Integration: Feishu ### Steps 1. Start the process and receive a Feishu message (e.g., event ID `msg-123`) 2. Verify the message is processed and recorded in dedup state 3. Restart the process 4. Trigger a WebSocket reconnection (or wait for Feishu to retransmit within 72 hours) 5. Verify that `msg-123` is **not** reprocessed (dedup state persisted across restart) ### Expected - Message is processed once - After restart, the same message is deduplicated and not reprocessed - Dedup file exists at `~/.openclaw/data/feishu-dedup.json` ### Actual - Before fix: Message would be reprocessed after restart - After fix: Message is correctly deduplicated ## Evidence - Code review: Dedup file is loaded on module init, written atomically with debouncing, and expired entries are filtered on load - No new tests added (existing dedup tests remain valid; persistence is transparent to callers) ## Human Verification - **Verified scenarios:** - Dedup state loads correctly from disk on startup - Expired entries (>72 hours old) are filtered out on load - New entries trigger debounced writes - Cleanup evictions trigger writes - Atomic writes prevent corruption (tmp file + rename) - Missing/corrupted file does not crash startup - End-to-end Feishu reconnection scenario (modify node modules directly) - **Edge cases checked:** - File absent on first run (graceful fallback) - Corrupted JSON (caught, no crash) - Rapid writes coalesced into single disk operation - Directory creation with `recursive: true` - **What you did NOT verify:** - Performance impact of disk I/O (debouncing mitigates; 500ms window is reasonable) I had not conducted the performance test. bht should be ok since it is not I/O demanding task ## Compatibility / Migration - Backward compatible? `Yes` (new file is created on first write; old in-memory state is discarded) - Config/env changes? `No` - Migration needed? `No` ## Failure Recovery - **How to disable/revert:** Revert to commit before this change; dedup will revert to 30-minute TTL and in-memory only - https://claude.ai/code/session_01BruPdSqtqoZP52mr9B8vks  <h3>Greptile Summary</h3> Extended Feishu dedup TTL from 30 minutes to 72 hours and added disk persistence to prevent duplicate message processing after process restarts. The implementation uses atomic writes with a debounced (500ms) write pattern to minimize I/O overhead. - TTL increased to match Feishu's 72-hour retransmission window - Added persistence layer with JSON file at `~/.openclaw/data/feishu-dedup.json` - Atomic writes via tmp file + rename pattern to prevent corruption - Expired entries filtered on load - Debounced writes coalesce rapid changes One logical issue found: cache eviction during full state may cause disk/memory inconsistency across restarts. <h3>Confidence Score: 4/5</h3> - Safe to merge with minor fix recommended - The implementation follows solid patterns (atomic writes, debouncing, graceful error handling) and solves the stated problem. One edge case with cache eviction could cause minor inconsistency but won't cause crashes or data loss - No files require special attention beyond the noted cache eviction issue <sub>Last reviewed commit: 90966e2</sub>  <sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>