#15628: fix: resolve session write lock race condition
agents
stale
size: S
Cluster:
Session Lock Improvements
Fixes #15623
## Problem
In `src/agents/session-write-lock.ts`, the final `release()` path deleted the in-process `HELD_LOCKS` entry **before** async cleanup (`handle.close()` + `fs.rm(lockPath)`). During that async gap, concurrent acquires in the same process could see no `HELD_LOCKS` entry but still observe the lock file on disk and spin on the filesystem retry loop until the 10s timeout.
## Solution
- Add `releasing?: Promise<void>` to the held lock entry.
- On final release, set `held.releasing` **before** any `await` so new acquires can detect teardown-in-progress.
- If an acquire observes `held.releasing`, it waits for the promise (bounded by the overall acquire timeout) rather than spinning on the filesystem lock file.
- Wrap `handle.close()` in `catch` so lock file removal still runs even if close fails.
## Behavioral change
Acquires that race with an in-process release now wait for the release to complete instead of polling the filesystem lock file, preventing intermittent 10s timeouts under high concurrency.
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR updates `src/agents/session-write-lock.ts` to prevent an in-process acquire/release race: the held-lock entry now tracks a `releasing` promise so that new acquires can wait for teardown-in-progress instead of falling back to filesystem polling on the `.lock` file.
The implementation centralizes release logic in `releaseHeldLock()`, sets `held.releasing` before awaiting `handle.close()` / `fs.rm()`, and makes teardown best-effort by swallowing `close()` errors so lock-file removal still runs. This aligns with how the lock is used by embedded runner flows that acquire once and release in a `finally` block.
I didn’t find any must-fix issues introduced by this change.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk.
- Change is localized to session write-lock acquisition/release, addresses a clear race window, preserves existing semantics for nested acquires, and callers in the codebase release exactly once in finally blocks. No new behavior outside the lock coordination path was found.
- src/agents/session-write-lock.ts
<sub>Last reviewed commit: aa99290</sub>
<!-- greptile_other_comments_section -->
**Context used:**
- Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))
- Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13))
<!-- /greptile_comment -->
Most Similar PRs
#10283: fix(agents): close TOCTOU race in session write lock acquisition
by programming-pupil · 2026-02-06
84.6%
#21828: fix: acquire session write lock in delivery mirror and gateway chat...
by inkolin · 2026-02-20
82.4%
#4044: fix: release session locks on SIGUSR1 restart + instance nonce for ...
by seanb4t · 2026-01-29
82.3%
#15882: fix: move session entry computation inside store lock to prevent ra...
by cloorus · 2026-02-14
80.3%
#20431: fix(sessions): add session contamination guards and self-leak lock ...
by marcomarandiz · 2026-02-18
78.0%
#5014: fix(agents): detect PID reuse in session write lock
by shayan919293 · 2026-01-30
77.9%
#4664: fix: per-session metadata files to eliminate lock contention
by tsukhani · 2026-01-30
77.4%
#10725: fix: re-read session store inside lock in updateSessionStoreEntry
by Yida-Dev · 2026-02-06
77.0%
#16609: fix: resolve session store race condition and contextTokens updates
by battman21 · 2026-02-14
76.7%
#20770: fix: prevent stale session-entry overwrite during reset-model persi...
by coygeek · 2026-02-19
75.6%