#21072: feat: add compaction retry config (maxAttempts, retryDelayMs)
agents
size: S
trusted-contributor
Cluster:
Compaction Safeguards and Summaries
## Summary
- **Problem:** Pre-emptive compaction hardcoded `maxAttempts=1`, causing session resets on transient provider errors (OAuth 401, timeouts)
- **Why it matters:** Single transient failure kills active conversations; overflow path had 3 retries but pre-emptive didn't
- **What changed:** Added `compaction.maxAttempts` (default: 3) and `compaction.retryDelayMs` (default: 1000ms) config options; both compaction paths now use config values
- **What did NOT change:** Default retry behavior improved from 1 to 3 attempts; no breaking changes to existing sessions
## Change Type (select all)
- [x] Feature
## Scope (select all touched areas)
- [x] Gateway / orchestration
## Linked Issue/PR
- Closes #20873
## User-visible / Behavior Changes
New optional config fields:
```json
{
"agents": {
"defaults": {
"compaction": {
"maxAttempts": 3,
"retryDelayMs": 1000
}
}
}
}
```
- Pre-emptive compaction now retries 3 times by default (was 1)
- Overflow compaction uses configured `maxAttempts` (was hardcoded 3)
- Retry attempts logged with `[compaction-retry]` prefix
## Security Impact (required)
- New permissions/capabilities? **No**
- Secrets/tokens handling changed? **No**
- New/changed network calls? **No** (just retries existing calls)
- Command/tool execution surface changed? **No**
- Data access scope changed? **No**
## Repro + Verification
### Environment
- OS: Any
- Runtime: Node 22+
- Provider: OAuth-based (e.g., google-antigravity) or any with transient failures
- Config: `compaction.mode: "safeguard"`
### Steps
1. Configure compaction with new options (or rely on defaults)
2. Trigger pre-emptive compaction (reach `maxHistoryShare` threshold)
3. Simulate transient provider error (OAuth 401, timeout)
### Expected
- Compaction retries up to `maxAttempts` times with `retryDelayMs` delay
- Session continues if retry succeeds
### Actual
- ✓ Retry logic executes with configured attempts
- ✓ Warnings logged on retry
- ✓ Defaults applied when fields omitted
## Evidence
- [x] Config schema updated with JSDoc and Zod validation
- [x] Both compaction paths (pre-emptive and overflow) use config values
- [x] `retryAsync` utility handles retry logic
## Human Verification (required)
- Verified scenarios:
- Config schema accepts new fields
- Defaults applied when omitted
- TypeScript compilation clean
- Edge cases checked:
- Zero/negative values rejected by Zod
- Existing sessions unaffected
- What you did **not** verify:
- Live OAuth 401 retry behavior (requires provider setup)
## Compatibility / Migration
- Backward compatible? **Yes**
- Config/env changes? **No** (optional fields)
- Migration needed? **No**
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Added configurable retry logic for compaction operations to handle transient provider failures (OAuth 401, timeouts). Pre-emptive compaction now defaults to 3 retry attempts (was hardcoded to 1), and overflow compaction uses the same configurable value.
**Key changes:**
- New config fields: `compaction.maxAttempts` (default: 3) and `compaction.retryDelayMs` (default: 1000ms)
- Both pre-emptive and overflow compaction paths now use `retryAsync` with configured values
- Retry warnings logged with `[compaction-retry]` prefix
- TypeScript types and Zod schema validation added for new fields
**Minor optimization opportunity:**
- Setting `jitter: 0` in the retry config would clarify intent for fixed-delay retries (currently gets clamped anyway)
<h3>Confidence Score: 5/5</h3>
- Safe to merge - backward compatible config change with sensible defaults
- Well-structured change with proper config validation, default values maintain backward compatibility (3 attempts is reasonable), and the retry logic integrates cleanly with existing infrastructure. The only issue is a minor style optimization for retry config clarity.
- No files require special attention
<sub>Last reviewed commit: 7024b81</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#20038: (fix): Compaction: preserve recent context and sync session memory ...
by rodrigouroz · 2026-02-18
79.8%
#18663: feat: progressive compaction escalation and mechanical flush fallback
by Adamya05 · 2026-02-16
78.9%
#19593: feat(compaction): proactive handover before context overflow
by qualiobra · 2026-02-18
78.0%
#15239: fix(compact): add execution-time fallback + transient retry for /co...
by VintLin · 2026-02-13
77.0%
#10505: feat(compaction): add timeout, model override, and diagnostic logging
by thebtf · 2026-02-06
77.0%
#6268: fix: add timeout to compaction retry to prevent session lockout
by batumilove · 2026-02-01
76.1%
#14021: feat(compaction): optional memory flush before manual /compact
by phenomenoner · 2026-02-11
76.1%
#15322: feat: post-compaction target token trimming + fallback strategy
by echoVic · 2026-02-13
75.8%
#20713: fix(compaction): trigger memory flush after missed compaction cycles
by zerone0x · 2026-02-19
75.6%
#19923: feat: track held messages during compaction gate and split verifica...
by PrivacySmurf · 2026-02-18
75.5%