← Back to PRs

#22385: fix: improve delivery recovery logging with entry age and deferral reasons

by derrickburns open 2026-02-21 03:31 View on GitHub →
size: L
## Problem Delivery recovery logs are ambiguous — `"Recovery time budget exceeded"` fires for both backoff-exceeds-budget and genuine timeouts, with no entry age info. Makes it impossible to tell if entries were stale or just had high retry backoffs. Fixes #22384 ## Changes - **`formatAge()`** helper — human-readable durations (`45s`, `3h 12m`, `2d 5h`) - **Initial log** now includes oldest/newest entry age - **Backoff deferral** gets its own distinct message showing backoff vs remaining budget - **Per-entry logs** include age and retry count - **Summary** includes deferred count when > 0 ## Before ``` Found 6 pending delivery entries — starting recovery Recovery time budget exceeded — 6 entries deferred to next restart Delivery recovery complete: 0 recovered, 0 failed, 0 skipped (max retries) ``` ## After ``` Found 6 pending delivery entries — starting recovery (oldest: 3h 12m, newest: 45s) Recovery deferred — backoff 25000ms exceeds remaining budget 58000ms; 6 entries deferred (entry abc-123, age 3h 12m, retry 2) Delivery recovery complete: 0 recovered, 0 failed, 0 skipped (max retries), 6 deferred ``` ## Notes - No behavior change — only log messages improved - `formatAge` is a pure function, easy to unit test - Backward compatible — no API/interface changes <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR contains two distinct sets of changes: 1. **`delivery-queue.ts`**: Logging-only improvements — adds a `formatAge()` helper for human-readable durations, enriches recovery log messages with entry age and retry info, differentiates backoff-deferral from genuine timeout, and includes deferred count in the summary. No behavior or API changes. 2. **`background.js`** (chrome extension): A substantial behavioral overhaul that goes well beyond the PR description ("only log messages improved"). Changes include: - Replaces single-tab click-to-toggle with an **auto-attach-all-tabs** mode - Adds **WebSocket auto-reconnection** with exponential backoff (up to 10 attempts) - Adds **MV3 state persistence** via `chrome.storage.local` for service worker restarts - Adds **keepalive alarms** to prevent service worker suspension - Adds **request timeouts** (30s) to prevent memory leaks in the `pending` map - Adds **child session detach events** for proper CDP session cleanup - Adds **navigation re-attach** when `target_closed` is the detach reason - Adds **operation locks** (`tabOperationLocks`) to prevent double-attach races - The `manifest.json` is **missing the `"alarms"` permission** required by the new `chrome.alarms.create` call, which will crash the service worker on load. - The fire-and-forget `detachTab` in `onDebuggerDetach` has a potential race with the 500ms re-attach timer. <h3>Confidence Score: 2/5</h3> - The delivery-queue.ts changes are safe, but the chrome extension changes have a blocking manifest bug and a race condition that need fixing before merge. - Score of 2 reflects: (1) a confirmed runtime-breaking bug — missing `alarms` permission in manifest.json will crash the MV3 service worker on load, and (2) a race condition in `onDebuggerDetach` where the re-attach timer may fire before the async detach completes, silently losing the tab. The delivery-queue.ts changes are clean and correct. - Pay close attention to `assets/chrome-extension/manifest.json` (missing alarms permission) and `assets/chrome-extension/background.js` (onDebuggerDetach race condition, overall significant behavior change vs. PR description). <sub>Last reviewed commit: 2a7ecfa</sub> <!-- greptile_other_comments_section --> <sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub> <!-- /greptile_comment -->

Most Similar PRs