← Back to PRs

#16949: fix(gateway): deliver chat:final even when sessionKey is unresolved (…

by ekleziast open 2026-02-15 08:24 View on GitHub →
gateway size: M
## Summary - **Problem:** WebSocket clients (voice assistants, TUI) never receive `chat:final` or `lifecycle:end` events when `resolveSessionKeyForRun()` returns `undefined` during agent runs with tool calls. - **Why it matters:** The agent completes successfully (response written to transcript), but WebSocket clients hang indefinitely until their watchdog fires. The entire response is lost. ### Commit 1: Restructure lifecycle handling - Extracted lifecycle end/error handling from the `if (sessionKey)` guard in `createAgentEventHandler()`. Added a three-tier fallback: chatLink -> sessionKey -> re-resolve from registry. ### Commit 2: Harden chat:final delivery Three additional improvements identified while investigating client-side symptoms: 1. **Buffer population fix** (`server-chat.ts`): Move `chatRunState.buffers.set()` for assistant text events outside the `if (sessionKey)` block. Previously, when `sessionKey` resolved late (e.g. between delta events and lifecycle:end), `emitChatFinal` would deliver an empty message because the buffer was never populated. 2. **Dead fallback → warning log** (`server-chat.ts`): The `else` branch at the end of lifecycle handling called `resolveSessionKeyForRun(evt.runId)` a second time — but this always returns `undefined` because the same call on line 329 already returned `undefined`. Replaced with `defaultRuntime.error()` to make the failure visible in logs. 3. **Embedded Pi safety net** (`agent-runner-execution.ts`): The CLI agent path has a `lifecycleTerminalEmitted` guard that ensures `lifecycle:error` is emitted even if the agent throws before emitting any lifecycle events. The embedded Pi path had no such safety net. Added a try/catch wrapper around `runEmbeddedPiAgent()` that emits `lifecycle:error` on uncaught exceptions. ## Change Type - [x] Bug fix ## Scope - [x] Gateway / orchestration - [x] Agent runner ## Linked Issue/PR - Closes #16427 ## User-visible / Behavior Changes WebSocket clients now reliably receive `chat:final` events after agent runs that involve tool calls, even when session key resolution initially fails. Additionally, embedded Pi agent runs that crash will now properly emit lifecycle:error events, preventing WS clients from hanging. ## Security Impact - New permissions/capabilities? No - Secrets/tokens handling changed? No - New/changed network calls? No - Command/tool execution surface changed? No - Data access scope changed? No ## Compatibility / Migration - Backward compatible? Yes - Config/env changes? No - Migration needed? No ## Risks and Mitigations - Risk: Duplicate lifecycle:error if embedded Pi's subscribe handlers already emitted one - Mitigation: In server-chat.ts, the handler clears context on first lifecycle:end/error. A duplicate event will find no context and be a no-op. - Risk: If `resolveSessionKeyForRun` has side effects when called twice, double invocation could cause issues - Mitigation: The function is a pure lookup (Map.get), no side effects <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR fixes a bug where WebSocket clients (voice assistants, TUI) would never receive `chat:final` or `lifecycle:end` events when `resolveSessionKeyForRun()` returned `undefined` during agent runs with tool calls, causing clients to hang indefinitely. - **Lifecycle handling restructured** (`server-chat.ts`): Moved lifecycle end/error handling out of the `if (sessionKey)` guard with a three-tier fallback: `chatLink` -> `sessionKey` -> warning log. This ensures `chat:final` is delivered even when sessionKey is unresolved, as long as a `chatLink` was registered earlier. - **Buffer population fix** (`server-chat.ts`): Moved `chatRunState.buffers.set()` for assistant text outside the `if (sessionKey)` block so `emitChatFinal` has buffered content even when sessionKey resolves late. - **Dead fallback replaced** (`server-chat.ts`): Replaced the redundant `resolveSessionKeyForRun` re-call with `defaultRuntime.error()` logging to surface unresolved sessionKey failures. - **Embedded Pi safety net** (`agent-runner-execution.ts`): Added try/catch IIFE wrapper around `runEmbeddedPiAgent()` to emit `lifecycle:error` on uncaught exceptions, mirroring the existing CLI path's `lifecycleTerminalEmitted` guard. <h3>Confidence Score: 4/5</h3> - This PR is safe to merge — it fixes a real client-hang bug with well-reasoned restructuring and defensive safety nets. - The changes are logically sound: lifecycle handling correctly moved outside the sessionKey guard, buffer population ensures content availability, and the embedded Pi safety net mirrors the tested CLI pattern. The duplicate lifecycle:error scenario is addressed (no-op on second event). The early return on line 468 skipping final cleanup is a pre-existing issue, not introduced here. Minor style concern with indentation in agent-runner-execution.ts. No new test coverage for the specific fixed scenarios, but existing tests continue to validate the chat event handler behavior. - src/auto-reply/reply/agent-runner-execution.ts has inconsistent indentation that should be cleaned up. <sub>Last reviewed commit: 80c1a2a</sub> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs