#18468: fix(agents): prevent infinite retry loops in sub-agent completion announcements (#18150)
channel: mattermost
agents
size: XS
Cluster:
Subagent Enhancements and Features
## Summary
Fixes #18150 - prevents 70+ duplicate sub-agent completion announcements from replaying into parent sessions by removing the duplicate resumption bypass in `retryDeferredCompletedAnnounces()`.
## Root Cause
When a sub-agent completion announcement is deferred (e.g., waiting for child session to fully settle or for active descendants to complete), the completion flow is designed to retry later. However, `retryDeferredCompletedAnnounces()` was deleting the runId from `resumedRuns` before calling `resumeSubagentRun()`, which bypassed the duplicate prevention check.
This created an infinite loop:
1. Sub-agent A completes, `cleanupHandled = true`
2. Announce starts (async) but is deferred (child not settled)
3. `finalizeSubagentCleanup` resets `cleanupHandled = false` to allow retry
4. Sub-agent B completes, triggers `retryDeferredCompletedAnnounces`
5. For run A: `resumedRuns.delete(runId)` + `resumeSubagentRun(runId)`
6. `resumeSubagentRun` bypasses check (runId not in Set)
7. Starts new announce, defers again, resets `cleanupHandled = false`
8. Next run completion triggers retry cascade again...
With cron jobs executing periodically, each execution would trigger a retry cascade, causing 70+ duplicate announcements into the parent session, burning tokens on duplicate NO_REPLY processing.
## Changes
Removed `resumedRuns.delete(runId)` from `retryDeferredCompletedAnnounces()` at line 290 of subagent-registry.ts. The `resumedRuns` Set should only be cleared by `finalizeSubagentCleanup()` when cleanup genuinely needs to be retried (!didAnnounce), not by the retry checker itself.
## Behavior
**Before**: Deferred announces would retry infinitely, causing 70+ duplicate announcements
**After**: Deferred announces still retry when conditions change, but duplicate resumption is prevented
## Testing
- All 8 subagent-registry tests pass (nested and steer-restart scenarios)
- Fix preserves legitimate retry behavior when announces are genuinely deferred
- Prevents the duplicate resumption that caused infinite loops
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR fixes a critical infinite retry loop bug in sub-agent completion announcement handling. The root cause was in `retryDeferredCompletedAnnounces()` at line 290, which was deleting entries from the `resumedRuns` Set before calling `resumeSubagentRun()`. This bypassed the duplicate prevention check in `resumeSubagentRun()` (line 82), allowing the same run to be resumed repeatedly when announces were deferred.
The fix removes the problematic `resumedRuns.delete(runId)` line, ensuring that:
- The duplicate prevention check in `resumeSubagentRun()` remains effective
- Only `finalizeSubagentCleanup()` clears `resumedRuns` entries when cleanup genuinely needs retry (`!didAnnounce`)
- Deferred announces can still retry legitimately when conditions change, but without infinite loops
Changes:
- Removed `resumedRuns.delete(runId)` from `retryDeferredCompletedAnnounces()` in `src/agents/subagent-registry.ts:290`
- Added detailed comment explaining the fix and referencing issue #18150
- Updated CHANGELOG.md with appropriate fix description
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with no risk - it's a minimal, well-understood bug fix that removes problematic code causing infinite loops
- The fix is a surgical one-line deletion that addresses a clear logic error. The removed line was bypassing duplicate prevention logic, causing 70+ duplicate announcements. The fix preserves all legitimate retry behavior while preventing the infinite loop. Existing tests cover both the nested agent scenarios and the steer-restart retry scenarios, and the PR description explicitly confirms all 8 subagent-registry tests pass.
- No files require special attention
<sub>Last reviewed commit: 8a7aef9</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#13105: fix: debounce subagent lifecycle events to prevent premature announ...
by mcaxtr · 2026-02-10
84.8%
#23166: fix(agents): restore subagent announce chain from #22223
by tyler6204 · 2026-02-22
81.6%
#22719: fix(agents): make subagent announce timeout configurable (restore 6...
by Valadon · 2026-02-21
81.0%
#18205: fix (agents): add periodic retry timer for failed subagent announces
by MegaPhoenix92 · 2026-02-16
79.5%
#20328: fix(agents): Add retry with exponential backoff for subagent announ...
by tiny-ship-it · 2026-02-18
79.4%
#22407: fix: allow agent turn after subagent completion message delivery
by noodleprincss-ai · 2026-02-21
79.1%
#18432: fix(agents): clear active run state immediately on embedded timeout
by BinHPdev · 2026-02-16
79.0%
#17721: fix: abort child run on subagent timeout + retry with backoff + sta...
by IrriVisionTechnologies · 2026-02-16
78.6%
#19243: fix(announce-queue): cap per-item send retries to prevent infinite ...
by taw0002 · 2026-02-17
78.1%
#17001: fix: retry sub-agent announcements with backoff instead of silently...
by luisecab · 2026-02-15
78.1%