#17001: fix: retry sub-agent announcements with backoff instead of silently dropping on timeout
agents
size: S
Cluster:
Subagent Enhancements and Features
## Summary
Sub-agent announcement delivery could be dropped on transient gateway failures (timeouts / closed connection). This PR preserves the retry-with-backoff behavior while rebasing onto latest `main`.
Closes #17000
## What this PR adds (unique value)
- Retry announce delivery with exponential backoff (2s → 4s → 8s)
- Retry only retryable errors (timeout / gateway closed)
- Keep non-retryable failures immediate
- Keep configurable announce timeout via `agents.defaults.subagents.announceTimeoutMs` (5s–300s, default 30s)
## Rebase alignment with main
- Reused `src/agents/announce-idempotency.ts` (no duplicate idempotency implementation)
- Kept deterministic announce idempotency keys for both queue and direct paths
- Clarified `expectFinal` handling comment in direct announce path (left unset so retries confirm accept/dedupe instead of waiting for terminal run completion)
## Files changed
- `src/agents/subagent-announce.ts`
- `src/config/zod-schema.agent-defaults.ts`
- `src/config/types.agent-defaults.ts`
- `src/agents/subagent-announce-queue.ts`
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR adds retry-with-exponential-backoff for sub-agent announcement delivery to handle transient gateway failures (timeouts, abnormal WebSocket closures, connection resets) instead of silently dropping announcements.
- Introduces `callGatewayWithRetry` wrapper with up to 3 retries and exponential backoff (2s → 4s → 8s), applied to both queued and direct announce delivery paths
- Adds configurable `announceTimeoutMs` setting (5s–300s, default 30s) via `agents.defaults.subagents.announceTimeoutMs`, replacing the previous hardcoded 15s timeout
- Narrows retry classification to exclude normal WebSocket closures (code 1000) using negative lookahead regex
- Removes `expectFinal: true` from the direct announce path so retries only confirm accept/dedupe rather than waiting for terminal run completion
- Correctly reuses deterministic idempotency keys across retries, ensuring gateway-level deduplication works as intended
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge — the retry logic is well-bounded, idempotent, and only targets transient failures.
- Score of 4 reflects: clean retry implementation with proper bounds and exponential backoff, correct idempotency key reuse across retries, appropriate error classification with the narrowed regex, consistent type/schema additions. The only minor concern is the lack of unit tests for the new retry logic, though the existing integration test coverage and the defensive coding style mitigate risk. The timeout increase from 15s to 30s default is intentional and documented.
- No files require special attention. The core logic in `src/agents/subagent-announce.ts` was reviewed thoroughly and the retry wrapper is straightforward.
<sub>Last reviewed commit: 69897a0</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#17028: fix(subagent): retry announce on timeout
by Limitless2023 · 2026-02-15
88.3%
#16944: fix: retry transient WebSocket 1006 closures in callGateway + annou...
by sudobot99 · 2026-02-15
87.0%
#20328: fix(agents): Add retry with exponential backoff for subagent announ...
by tiny-ship-it · 2026-02-18
86.4%
#18205: fix (agents): add periodic retry timer for failed subagent announces
by MegaPhoenix92 · 2026-02-16
83.3%
#22719: fix(agents): make subagent announce timeout configurable (restore 6...
by Valadon · 2026-02-21
83.3%
#17721: fix: abort child run on subagent timeout + retry with backoff + sta...
by IrriVisionTechnologies · 2026-02-16
80.3%
#13105: fix: debounce subagent lifecycle events to prevent premature announ...
by mcaxtr · 2026-02-10
80.2%
#16239: fix: retry on transient API errors (overloaded, rate-limit, timeout)
by zerone0x · 2026-02-14
78.5%
#18468: fix(agents): prevent infinite retry loops in sub-agent completion a...
by BinHPdev · 2026-02-16
78.1%
#8677: fix: add retry logic to OAuth token refresh
by skyblue-will · 2026-02-04
77.5%