#9092: fix: skip retry when block content already streamed to user
agents
stale
Cluster:
Telegram Message Handling Fixes
## Summary
- Tracks whether block replies were emitted during an LLM attempt via new `didEmitBlockReply` state flag
- Skips auth profile rotation retry when content was already delivered to the user
- Prevents duplicate responses when LLM returns error after streaming content (e.g., Gemini 500 INTERNAL after complete response)
## Problem
When Gemini (and potentially other providers) streams a complete response but then returns a 500 error on stream close, the retry logic would rotate auth profiles and re-run the prompt, causing the same response to be delivered multiple times.
## Solution
Track when block replies are emitted during streaming via `didEmitBlockReply`, and check this flag before triggering retry. If content was already streamed to the user, skip the retry to avoid duplicate messages.
## Test plan
- [x] Add unit test: "skips retry when block content was already streamed"
- [x] All existing auth profile rotation tests pass (9 tests)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR adds a new per-attempt flag (`didStreamBlockReply`) derived from subscription state (`didEmitBlockReply`) to detect whether block replies were streamed during an LLM attempt. `runEmbeddedPiAgent` then uses that flag to skip auth-profile rotation retries when a provider returns an error after content has already been streamed, preventing duplicate responses (notably for Gemini errors on stream close).
The change threads through `subscribeEmbeddedPiSession` → `runEmbeddedAttempt` → `runEmbeddedPiAgent`, and includes a unit test covering the “streamed content + terminal error should not retry” scenario.
<h3>Confidence Score: 3/5</h3>
- This PR is close to safe to merge, but the new retry guard can be bypassed in real streaming configurations.
- Core logic is small and well-targeted, but `didStreamBlockReply` currently reflects `onBlockReply` invocation rather than “content already delivered”, so the guard may fail to prevent duplicates in some streaming paths. Tests in this environment couldn’t be executed due to missing dependencies, so confidence relies on static review.
- src/agents/pi-embedded-subscribe.ts, src/agents/pi-embedded-runner/run.ts
<!-- greptile_other_comments_section -->
<sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>
**Context used:**
- Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))
- Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13))
<!-- /greptile_comment -->
Most Similar PRs
#8205: fix: flush followup messages incrementally
by hanxiao · 2026-02-03
77.7%
#17953: fix(telegram): prevent silent message loss and duplicate messages i...
by zuyan9 · 2026-02-16
74.6%
#14977: fix(telegram): remove ack reaction after block-streamed replies
by Diaspar4u · 2026-02-12
73.4%
#17265: fix: abort streaming runs after 90s of inactivity
by jg-noncelogic · 2026-02-15
73.1%
#18072: fix(Telegram): usage footer not sent to Telegram when blockStreamin...
by yinghaosang · 2026-02-16
73.1%
#4495: Fix: emit final assistant event when reply tags hide stream
by ukeate · 2026-01-30
72.1%
#21462: fix(agents): hold back partial NO_REPLY token in pi-embedded streaming
by algal · 2026-02-20
72.0%
#5080: fix(reply): fix duplicate block replies by unblocking coalesced pay...
by yassine20011 · 2026-01-31
72.0%
#10612: fix: trim leading blank lines on first emitted chunk only (#5530)
by 1kuna · 2026-02-06
71.8%
#12180: fix: merge multi-block assistant texts into single reply
by 1960697431 · 2026-02-08
71.8%