← Back to PRs

#10259: fix(sessions): clean up orphaned .jsonl.lock files on startup (#10170)

by nu-gui open 2026-02-06 08:24 View on GitHub →
gateway agents stale
## Summary - Stale `.lock` files from crashed gateway processes cause "request ended without getting any chunks" errors and permanently stuck sessions - Add `cleanupOrphanedLocks()` to scan lock files, check if the owning PID is still alive, and remove orphaned locks - Call cleanup on gateway startup in `startGatewaySidecars()` before other services initialize Fixes #10170 ## Test plan - [x] 5 new tests for `cleanupOrphanedLocks()`: dead PID removal, corrupted payload, alive PID preservation, non-lock file filtering, non-existent directory - [x] All 13 tests pass (8 original + 5 new) - [x] `pnpm check` passes (0 warnings, 0 errors) <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> - Adds `cleanupOrphanedLocks()` to scan a sessions directory for `*.lock` files, validate their JSON payload, and remove locks deemed orphaned. - Hooks the cleanup into gateway startup (`startGatewaySidecars`) before other sidecar services initialize to avoid stuck sessions after crashes. - Extends `session-write-lock` tests with new cases covering dead/alive PIDs, corrupted payloads, non-lock files, and missing directories. <h3>Confidence Score: 3/5</h3> - This PR is mostly safe but has a real risk of deleting valid locks in some environments. - The new startup cleanup improves resilience to crashes, but the liveness check uses `process.kill(pid, 0)` and treats any error as “dead”; in environments where the gateway user lacks permission to signal another user’s process, this can misclassify live processes and remove their locks, potentially allowing concurrent writers. - src/agents/session-write-lock.ts <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs