#16949: fix(gateway): deliver chat:final even when sessionKey is unresolved (…
gateway
size: M
## Summary
- **Problem:** WebSocket clients (voice assistants, TUI) never receive `chat:final` or `lifecycle:end` events when `resolveSessionKeyForRun()` returns `undefined` during agent runs with tool calls.
- **Why it matters:** The agent completes successfully (response written to transcript), but WebSocket clients hang indefinitely until their watchdog fires. The entire response is lost.
### Commit 1: Restructure lifecycle handling
- Extracted lifecycle end/error handling from the `if (sessionKey)` guard in `createAgentEventHandler()`. Added a three-tier fallback: chatLink -> sessionKey -> re-resolve from registry.
### Commit 2: Harden chat:final delivery
Three additional improvements identified while investigating client-side symptoms:
1. **Buffer population fix** (`server-chat.ts`): Move `chatRunState.buffers.set()` for assistant text events outside the `if (sessionKey)` block. Previously, when `sessionKey` resolved late (e.g. between delta events and lifecycle:end), `emitChatFinal` would deliver an empty message because the buffer was never populated.
2. **Dead fallback → warning log** (`server-chat.ts`): The `else` branch at the end of lifecycle handling called `resolveSessionKeyForRun(evt.runId)` a second time — but this always returns `undefined` because the same call on line 329 already returned `undefined`. Replaced with `defaultRuntime.error()` to make the failure visible in logs.
3. **Embedded Pi safety net** (`agent-runner-execution.ts`): The CLI agent path has a `lifecycleTerminalEmitted` guard that ensures `lifecycle:error` is emitted even if the agent throws before emitting any lifecycle events. The embedded Pi path had no such safety net. Added a try/catch wrapper around `runEmbeddedPiAgent()` that emits `lifecycle:error` on uncaught exceptions.
## Change Type
- [x] Bug fix
## Scope
- [x] Gateway / orchestration
- [x] Agent runner
## Linked Issue/PR
- Closes #16427
## User-visible / Behavior Changes
WebSocket clients now reliably receive `chat:final` events after agent runs that involve tool calls, even when session key resolution initially fails. Additionally, embedded Pi agent runs that crash will now properly emit lifecycle:error events, preventing WS clients from hanging.
## Security Impact
- New permissions/capabilities? No
- Secrets/tokens handling changed? No
- New/changed network calls? No
- Command/tool execution surface changed? No
- Data access scope changed? No
## Compatibility / Migration
- Backward compatible? Yes
- Config/env changes? No
- Migration needed? No
## Risks and Mitigations
- Risk: Duplicate lifecycle:error if embedded Pi's subscribe handlers already emitted one
- Mitigation: In server-chat.ts, the handler clears context on first lifecycle:end/error. A duplicate event will find no context and be a no-op.
- Risk: If `resolveSessionKeyForRun` has side effects when called twice, double invocation could cause issues
- Mitigation: The function is a pure lookup (Map.get), no side effects
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR fixes a bug where WebSocket clients (voice assistants, TUI) would never receive `chat:final` or `lifecycle:end` events when `resolveSessionKeyForRun()` returned `undefined` during agent runs with tool calls, causing clients to hang indefinitely.
- **Lifecycle handling restructured** (`server-chat.ts`): Moved lifecycle end/error handling out of the `if (sessionKey)` guard with a three-tier fallback: `chatLink` -> `sessionKey` -> warning log. This ensures `chat:final` is delivered even when sessionKey is unresolved, as long as a `chatLink` was registered earlier.
- **Buffer population fix** (`server-chat.ts`): Moved `chatRunState.buffers.set()` for assistant text outside the `if (sessionKey)` block so `emitChatFinal` has buffered content even when sessionKey resolves late.
- **Dead fallback replaced** (`server-chat.ts`): Replaced the redundant `resolveSessionKeyForRun` re-call with `defaultRuntime.error()` logging to surface unresolved sessionKey failures.
- **Embedded Pi safety net** (`agent-runner-execution.ts`): Added try/catch IIFE wrapper around `runEmbeddedPiAgent()` to emit `lifecycle:error` on uncaught exceptions, mirroring the existing CLI path's `lifecycleTerminalEmitted` guard.
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge — it fixes a real client-hang bug with well-reasoned restructuring and defensive safety nets.
- The changes are logically sound: lifecycle handling correctly moved outside the sessionKey guard, buffer population ensures content availability, and the embedded Pi safety net mirrors the tested CLI pattern. The duplicate lifecycle:error scenario is addressed (no-op on second event). The early return on line 468 skipping final cleanup is a pre-existing issue, not introduced here. Minor style concern with indentation in agent-runner-execution.ts. No new test coverage for the specific fixed scenarios, but existing tests continue to validate the chat event handler behavior.
- src/auto-reply/reply/agent-runner-execution.ts has inconsistent indentation that should be cleaned up.
<sub>Last reviewed commit: 80c1a2a</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#14309: fix(ui): resolve chat event session key mismatch
by justonlyforyou · 2026-02-11
81.8%
#21828: fix: acquire session write lock in delivery mirror and gateway chat...
by inkolin · 2026-02-20
80.0%
#13720: fix: emit single lifecycle start/end pair for model fallback chain
by gitsual · 2026-02-10
79.5%
#14946: fix(webchat): accumulate text across blocks in streaming buffer
by mcaxtr · 2026-02-12
79.1%
#20431: fix(sessions): add session contamination guards and self-leak lock ...
by marcomarandiz · 2026-02-18
78.1%
#8598: fix: emit agent events for CLI agents targeting webchat
by dbottme · 2026-02-04
78.1%
#22480: fix: memory leak, silent WS failures, and connection error handling
by Chase-Xuu · 2026-02-21
78.0%
#6611: fix(webchat): persist assistant messages for CLI backends
by JorgeAlan · 2026-02-01
77.9%
#17527: fix(gateway): allow WebChat to attach to main session regardless of...
by Glucksberg · 2026-02-15
77.8%
#10273: fix(agents): detect and auto-compact mid-run context overflow
by terryops · 2026-02-06
77.5%