#22385: fix: improve delivery recovery logging with entry age and deferral reasons
size: L
Cluster:
Chrome Extension Enhancements
## Problem
Delivery recovery logs are ambiguous — `"Recovery time budget exceeded"` fires for both backoff-exceeds-budget and genuine timeouts, with no entry age info. Makes it impossible to tell if entries were stale or just had high retry backoffs.
Fixes #22384
## Changes
- **`formatAge()`** helper — human-readable durations (`45s`, `3h 12m`, `2d 5h`)
- **Initial log** now includes oldest/newest entry age
- **Backoff deferral** gets its own distinct message showing backoff vs remaining budget
- **Per-entry logs** include age and retry count
- **Summary** includes deferred count when > 0
## Before
```
Found 6 pending delivery entries — starting recovery
Recovery time budget exceeded — 6 entries deferred to next restart
Delivery recovery complete: 0 recovered, 0 failed, 0 skipped (max retries)
```
## After
```
Found 6 pending delivery entries — starting recovery (oldest: 3h 12m, newest: 45s)
Recovery deferred — backoff 25000ms exceeds remaining budget 58000ms; 6 entries deferred (entry abc-123, age 3h 12m, retry 2)
Delivery recovery complete: 0 recovered, 0 failed, 0 skipped (max retries), 6 deferred
```
## Notes
- No behavior change — only log messages improved
- `formatAge` is a pure function, easy to unit test
- Backward compatible — no API/interface changes
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR contains two distinct sets of changes:
1. **`delivery-queue.ts`**: Logging-only improvements — adds a `formatAge()` helper for human-readable durations, enriches recovery log messages with entry age and retry info, differentiates backoff-deferral from genuine timeout, and includes deferred count in the summary. No behavior or API changes.
2. **`background.js`** (chrome extension): A substantial behavioral overhaul that goes well beyond the PR description ("only log messages improved"). Changes include:
- Replaces single-tab click-to-toggle with an **auto-attach-all-tabs** mode
- Adds **WebSocket auto-reconnection** with exponential backoff (up to 10 attempts)
- Adds **MV3 state persistence** via `chrome.storage.local` for service worker restarts
- Adds **keepalive alarms** to prevent service worker suspension
- Adds **request timeouts** (30s) to prevent memory leaks in the `pending` map
- Adds **child session detach events** for proper CDP session cleanup
- Adds **navigation re-attach** when `target_closed` is the detach reason
- Adds **operation locks** (`tabOperationLocks`) to prevent double-attach races
- The `manifest.json` is **missing the `"alarms"` permission** required by the new `chrome.alarms.create` call, which will crash the service worker on load.
- The fire-and-forget `detachTab` in `onDebuggerDetach` has a potential race with the 500ms re-attach timer.
<h3>Confidence Score: 2/5</h3>
- The delivery-queue.ts changes are safe, but the chrome extension changes have a blocking manifest bug and a race condition that need fixing before merge.
- Score of 2 reflects: (1) a confirmed runtime-breaking bug — missing `alarms` permission in manifest.json will crash the MV3 service worker on load, and (2) a race condition in `onDebuggerDetach` where the re-attach timer may fire before the async detach completes, silently losing the tab. The delivery-queue.ts changes are clean and correct.
- Pay close attention to `assets/chrome-extension/manifest.json` (missing alarms permission) and `assets/chrome-extension/background.js` (onDebuggerDetach race condition, overall significant behavior change vs. PR description).
<sub>Last reviewed commit: 2a7ecfa</sub>
<!-- greptile_other_comments_section -->
<sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#15817: fix(chrome-relay): auto-reconnect, MV3 persistence, and keepalive
by derrickburns · 2026-02-13
79.7%
#20329: Fix cron.run WS blocking and harden delivery recovery
by guirguispierre · 2026-02-18
78.8%
#17588: fix(relay): survive WS disconnects and MV3 worker restarts
by Unayung · 2026-02-15
78.6%
#19353: fix(diagnostics-otel): fix cross-chunk module isolation breaking even…
by nez · 2026-02-17
76.2%
#19766: fix: Chrome relay extension auto-reattach after SPA navigation
by nishantkabra77 · 2026-02-18
75.8%
#16733: fix(ui): avoid injected newlines when tool output is hidden
by jp117 · 2026-02-15
75.7%
#11874: fix: handle fetch rejections in provider usage withTimeout
by Zjianru · 2026-02-08
75.6%
#23672: fix(resilience): guard JSON.parse of external process output with t...
by kevinWangSheng · 2026-02-22
75.6%
#22993: fix(delivery): guard JSON.parse in failDelivery to prevent silent i...
by adhitShet · 2026-02-21
75.5%
#14719: UI: fix debug event log layout and health history toggle
by detecti1 · 2026-02-12
75.3%