#13720: fix: emit single lifecycle start/end pair for model fallback chain

by gitsual open 2026-02-10 23:02 View on GitHub →

commands agents stale

Cluster: Webchat Session Fixes and Enhancements

## Problem When the primary model (e.g., Claude) fails with rate limit/auth error and falls back to a secondary model (e.g., Ollama), the TUI shows "\(no output\)" even though responses are generated correctly and saved to the session file. ## Root Cause Each model attempt emitted its own `lifecycle:start/end` pair. When Claude failed, its `lifecycle:end` caused the gateway to close the `chatLink`, so subsequent Ollama responses were ignored by the TUI because `finalizedRuns.has(runId)` was already true. ## Solution Move lifecycle event emission from the per-attempt subscription handler to the outer command layer, wrapping the entire fallback chain in a single `lifecycle:start/end` pair. ### Changes 1. **`pi-embedded-subscribe.handlers.lifecycle.ts`**: Remove `emitAgentEvent()` from `handleAgentStart`/`handleAgentEnd` (keep internal callbacks only) 2. **`agent-runner-execution.ts`**: Emit `lifecycle:start` before `runWithModelFallback` loop, `lifecycle:end/error` after completion 3. **`agent.ts`, `followup-runner.ts`, `agent-runner-memory.ts`, `isolated-agent/run.ts`**: Same pattern for other entry points ## Visual Explanation ``` BEFORE (broken): ┌─ Claude attempt ─┐ │ start → fail → end │ ← Gateway closes here └──────────────────────┘ ┌─ Ollama attempt ─┐ │ start → ok → end │ ← TUI ignores this └──────────────────────┘ AFTER (working): ┌─────────────────────────────────┐ │ start │ │ ┌─ Claude ─┐ │ │ │ fail │ (no events) │ │ └──────────┘ │ │ ┌─ Ollama ─┐ │ │ │ ok │ (no events) │ │ └──────────┘ │ │ end │ ← Gateway closes here └─────────────────────────────────┘ ``` ## Testing Tested by temporarily invalidating the Anthropic API key to force auth error, confirmed Ollama fallback responds correctly without "\(no output\)".  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR changes lifecycle event emission so each agent run (including model fallback retries) produces a single `lifecycle:start` + terminal `lifecycle:end/error` pair, instead of emitting per-attempt lifecycle events inside the embedded subscription handler. This is intended to prevent the gateway from finalizing a run after the first failed attempt, which previously caused downstream UIs to show “(no output)” despite successful fallback responses. The main remaining concern is in the auto-reply runner’s CLI-provider path: assistant text is emitted in a Promise `.then()` while `lifecycle:end` can be emitted immediately afterward, which can still lead to the gateway finalizing the run before the assistant text arrives (dropping output). <h3>Confidence Score: 3/5</h3> - This PR is close to mergeable but has one ordering bug that can still drop output for CLI providers. - Lifecycle events are now correctly consolidated across fallback chains, matching the gateway’s finalize-on-end/error behavior. However, in `runAgentTurnWithFallback` the CLI-provider branch emits the assistant text in a Promise `.then()` while emitting `lifecycle:end` outside that continuation, allowing `lifecycle:end` to arrive first and causing the gateway to finalize before the assistant delta is buffered. - src/auto-reply/reply/agent-runner-execution.ts