#23226: fix(msteams): proactive messaging, EADDRINUSE fix, tool status, adaptive cards

by TarogStar open 2026-02-22 03:55 View on GitHub →

channel: msteams size: XL

Cluster: Fix Microsoft Teams Plugin Issues

## Summary - **Problem**: MS Teams replies fail with "Cannot perform 'set' on a proxy that has been revoked" when the LLM takes >15s to respond. The Bot Framework webhook TurnContext is a proxy that gets revoked after the HTTP handler returns, but with slow local models the agent response arrives well after that. Additionally, the provider promise resolved immediately after server bind, causing the channel manager to interpret it as "provider exited" and trigger EADDRINUSE restart loops. - **Why it matters**: With local LLMs (15-45s response times), every reply delivery, typing indicator, and adaptive card send was failing after the first few seconds. The restart loop compounded the issue by trying to rebind the same port repeatedly. - **What changed**: - All reply delivery (thread + top-level), typing indicators, and adaptive card sends now use proactive messaging via `adapter.continueConversation()` instead of the short-lived webhook TurnContext - Provider promise stays pending until abort signal fires (BlueBubbles pattern), preventing EADDRINUSE restart loops - Added tool status messages ("Checking email...", "Searching the web..."), GET /health endpoint, and adaptive card action handling (Action.Execute + Action.Submit) - Conversation reference seeded on bot install for proactive messaging support before first user message - Extracted shared helpers, added size cap to global invoke queue, sanitized untrusted input in synthetic text - **What did NOT change**: No changes to the core OpenClaw engine, config schema, or other channel plugins. The MS Teams webhook contract, Bot Framework SDK usage, and message routing logic are unchanged. ## Change Type (select all) - [x] Bug fix - [x] Feature - [x] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [ ] API / contracts - [x] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #18636 (proxy revoked when agent takes >15s) - Closes #15752 (bot replies don't stay in channel threads) - Closes #22169 (EADDRINUSE restart loop) - Closes #20448 (messages after tool execution not delivered — same proxy revocation root cause) - Related #9873 (channel replies not sent — possibly related but older) ## User-visible / Behavior Changes - Bot replies now reliably delivered regardless of LLM response time (no more proxy revocation failures) - Thread replies stay in the correct channel thread via `replyToId` on proactive messages - Tool status messages appear during tool execution ("Checking email...", "Searching the web...") - Typing indicator shows reliably via proactive messaging - Adaptive card button clicks (Action.Execute and Action.Submit) are handled and routed to the agent - GET /health endpoint available for monitoring - No config changes needed — all improvements are automatic ## Security Impact (required) - New permissions/capabilities? `No` - Secrets/tokens handling changed? `No` - New/changed network calls? `No` (same Bot Framework API, just via proactive messaging path) - Command/tool execution surface changed? `No` - Data access scope changed? `No` ## Repro + Verification ### Environment - OS: Linux (WSL2) - Runtime/container: Node v24.13.1 - Model/provider: qwen3-8b via LMStudio (local, 15-45s response times) - Integration/channel: MS Teams (personal chat) - Relevant config: `gateway.mode: "local"`, `channels.msteams.blockStreamingCoalesce: { minChars: 300, maxChars: 800, idleMs: 1500 }` ### Steps 1. Configure MS Teams channel with a slow local LLM (>15s response time) 2. Send a message to the bot in Teams 3. Observe typing indicator, tool status messages, and reply delivery ### Expected - Typing dots appear immediately - Tool status messages appear during tool execution - Reply delivered successfully in the correct thread - No errors in logs ### Actual (before fix) - Reply fails with "Cannot perform 'set' on a proxy that has been revoked" - Typing indicator fails silently after webhook context expires - Provider restarts in a loop with EADDRINUSE errors ## Evidence - [x] Failing test/log before + passing after - [x] Trace/log snippets Log output after fix (zero errors, clean dispatch): ``` [msteams] starting provider (port 3978) [msteams] received message [msteams] dispatching to agent [msteams] dispatch complete ``` Before fix: logs showed 5+ EADDRINUSE restart attempts and proxy revocation errors on every reply. All 165 msteams tests pass. `pnpm build && pnpm check` clean. ## Human Verification (required) - Verified scenarios: Personal chat reply delivery, typing indicators, tool status messages, adaptive card handling, bot restart (no EADDRINUSE), cold start after bot install - Edge cases checked: Slow LLM responses (45s+), multiple sequential messages, gateway restart behavior, conversation reference persistence - What I did **not** verify: Group chat and channel thread behavior (only tested personal chat), other channel plugins ## Compatibility / Migration - Backward compatible? `Yes` - Config/env changes? `No` - Migration needed? `No` ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: Revert the msteams extension commits - Files/config to restore: `extensions/msteams/src/` directory - Known bad symptoms: If proactive messaging fails, replies would not be delivered at all (vs the old behavior where they sometimes worked if the LLM was fast enough). Monitor for "reply failed" log entries. ## Risks and Mitigations - Risk: Proactive messaging requires a valid stored conversation reference; if the reference becomes stale, replies may fail - Mitigation: Conversation references are refreshed on every inbound message and seeded on bot install. The reference includes serviceUrl which Bot Framework keeps stable per tenant. - Risk: Global invoke queue (`__openclaw_pending_card_invokes`) could grow unbounded if invokes are never consumed - Mitigation: Added size cap of 50 entries with oldest-first eviction Opus 4.6 assisted  <h3>Greptile Summary</h3> Refactors MS Teams integration to use proactive messaging for all replies, typing indicators, and adaptive card sends, eliminating proxy revocation errors with slow LLMs (>15s responses). Also fixes EADDRINUSE restart loop by keeping provider promise pending until abort signal fires. - Switched from webhook `TurnContext` (revoked after HTTP request) to `adapter.continueConversation()` for all outbound operations - Provider now waits for abort signal before resolving, preventing channel manager from interpreting early resolve as "provider exited" - Added `/health` endpoint for monitoring (accessible before JWT auth) - Implemented tool status messages ("Checking email...", "Searching the web...") that appear during tool execution - Added handling for adaptive card actions (Action.Execute and Action.Submit) with global invoke queue - Conversation reference now seeded on bot install for proactive messaging before first user message - All changes are backward compatible with no config modifications required <h3>Confidence Score: 4/5</h3> - Safe to merge with minor observations - well-architected solution with comprehensive tests - Strong implementation with extensive test coverage (392 new test lines), clear architectural benefits, and proper error handling. The proactive messaging pattern correctly addresses the proxy revocation issue. One minor observation about optional chaining usage that could be simplified for clarity. - No files require special attention - all changes follow established patterns <sub>Last reviewed commit: d94f5a4</sub>  <sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>