#21828: fix: acquire session write lock in delivery mirror and gateway chat paths
app: web-ui
gateway
size: S
Cluster:
Session Lock Improvements
## Summary
Fixes #21813 — Session permanently broken by concurrent agent runs writing consecutive assistant messages.
**Root cause:** Two code paths write to session JSONL files without acquiring `acquireSessionWriteLock()`:
1. **`src/config/sessions/transcript.ts`** — `appendAssistantMessageToSessionTranscript()` calls `SessionManager.open().appendMessage()` directly, without the file-level lock used by the main agent run path.
2. **`src/gateway/server-methods/chat.ts`** — Same pattern: direct `SessionManager.open().appendMessage()` without lock acquisition.
Meanwhile, the main agent run in `src/agents/pi-embedded-runner/run/attempt.ts` correctly acquires `acquireSessionWriteLock()` before writing. This creates a race window where the delivery mirror write interleaves with the next agent run, producing consecutive `assistant` role entries that permanently corrupt the session.
## Fix
Wrap both write paths with `acquireSessionWriteLock()` / release, matching the pattern in `attempt.ts`. This ensures serialized access to session transcript files across all write paths.
### Changes
**`src/config/sessions/transcript.ts`:**
- Import `acquireSessionWriteLock` from `../../agents/session-write-lock.js`
- Wrap `SessionManager.open().appendMessage()` in `appendAssistantMessageToSessionTranscript()` with lock acquire/try/finally release (`maxHoldMs: 10_000`)
**`src/gateway/server-methods/chat.ts`:**
- Import `acquireSessionWriteLock` from `../../agents/session-write-lock.js`
- Make `appendAssistantTranscriptMessage()` async, wrap `SessionManager.open().appendMessage()` with lock acquire/try/finally release (`maxHoldMs: 10_000`)
- Make `persistAbortedPartials()` async, `await` the now-async append call
- Add `void` to fire-and-forget `persistAbortedPartials()` calls (abort handlers — best-effort persist, return value unused)
- Add `await` at remaining call sites (`.then()` callback in `chat.send`, `chat.inject` handler)
## Test plan
- [ ] Verify lock acquisition prevents concurrent writes to the same session JSONL
- [ ] Confirm delivery mirror path correctly serializes with agent run writes
- [ ] Verify gateway chat.ts path acquires lock before appending
- [ ] No regression: single-writer paths still work without deadlock
- [ ] Lock is reentrant (acquireSessionWriteLock supports `allowReentrant: true` by default), so nested calls from the same process do not deadlock
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR fixes a critical race condition that permanently corrupts session JSONL files when concurrent agent runs write consecutive assistant messages. The fix wraps two previously unprotected write paths (`src/config/sessions/transcript.ts` and `src/gateway/server-methods/chat.ts`) with `acquireSessionWriteLock()`, matching the pattern already used in the main agent run path (`src/agents/pi-embedded-runner/run/attempt.ts`).
**Key changes:**
- `src/config/sessions/transcript.ts`: Wrapped `SessionManager.open().appendMessage()` in `appendAssistantMessageToSessionTranscript()` with lock acquire/try/finally release
- `src/gateway/server-methods/chat.ts`: Made `appendAssistantTranscriptMessage()` and `persistAbortedPartials()` async, added lock acquisition, and properly awaited calls (using `void` for fire-and-forget abort handlers, `await` for all other call sites)
The lock is reentrant by default (`allowReentrant: true`), preventing deadlocks from nested calls within the same process. The `maxHoldMs: 10_000` setting (10 seconds) is consistent with typical session write operations and includes watchdog protection to prevent indefinite lock holds.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk - it addresses a critical data corruption bug using established patterns
- The fix uses the exact same lock acquisition pattern (`acquireSessionWriteLock` with `maxHoldMs: 10_000`) that's already proven in production across multiple code paths (`attempt.ts`, `compact.ts`). All async/await changes are correct, the lock is properly released in finally blocks, and the reentrant lock design prevents deadlocks. The changes are minimal, focused, and directly address the root cause described in the linked issue.
- No files require special attention
<sub>Last reviewed commit: 62c2530</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#20431: fix(sessions): add session contamination guards and self-leak lock ...
by marcomarandiz · 2026-02-18
84.0%
#15882: fix: move session entry computation inside store lock to prevent ra...
by cloorus · 2026-02-14
83.9%
#15628: fix: resolve session write lock race condition
by 1kuna · 2026-02-13
82.4%
#17743: fix(agents): disable orphaned user message deletion that causes ses...
by clawrl3000 · 2026-02-16
81.0%
#23583: fix(agents): catch session JSONL write failures instead of crashing
by mohandshamada · 2026-02-22
80.8%
#4664: fix: per-session metadata files to eliminate lock contention
by tsukhani · 2026-01-30
80.3%
#16949: fix(gateway): deliver chat:final even when sessionKey is unresolved (…
by ekleziast · 2026-02-15
80.0%
#15050: fix: transcript corruption resilience — strip aborted tool_use bloc...
by yashchitneni · 2026-02-12
79.3%
#13104: fix: persist user command message in chat transcript
by mcaxtr · 2026-02-10
79.1%
#19328: Fix: preserve modelOverride in agent handler (#5369)
by CodeReclaimers · 2026-02-17
78.9%