#7760: fix(agents): resolve message ordering conflict during tool execution
agents
stale
Cluster:
Error Handling in Agent Tools
## Problem
When a user sends a new message while the agent is actively executing tool calls, a "Message ordering conflict" error is triggered. The bot appears to work but the user's message is lost, requiring them to resend after the agent finishes.
Error message:
```
Message ordering conflict - please try again
```
## Root Cause
`clearActiveEmbeddedRun()` signals "run ended" before `flushPendingToolResults()` writes tool results to the session file. Since:
```
clearActiveEmbeddedRun() → waiters proceed → read incomplete session → role ordering error
```
The incoming message sees an incomplete transcript (missing tool results), causing consecutive user messages which violates role alternation rules.
## Fix
Move `clearActiveEmbeddedRun()` to after `flushPendingToolResults()` in the outer finally block. This ensures tool results are persisted before any waiting messages proceed.
## Impact
**Before:** User messages during tool execution trigger ordering conflicts. Message is lost.
**After:** Tool results are flushed first, then waiters proceed. No conflict.
Fixes #7694
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR adjusts the embedded Pi runner teardown sequence in `src/agents/pi-embedded-runner/run/attempt.ts` to prevent “message ordering conflict” errors when a new user message arrives while tool calls are still being flushed. The main change is moving `clearActiveEmbeddedRun()` out of the inner cleanup and into the outer `finally`, after `sessionManager.flushPendingToolResults()` so that any waiter that proceeds will see a complete transcript (including synthetic tool results) before starting a follow-up run.
The change fits into the existing concurrency model where (1) the session write lock protects session file writes and (2) the active-run registry (`ACTIVE_EMBEDDED_RUNS`) gates whether messages can be queued / whether callers wait for a run to finish.
<h3>Confidence Score: 4/5</h3>
- This PR is likely safe to merge and addresses a real race, with low residual risk.
- The change is localized and aligns the active-run signaling with persistence of tool results, which should prevent incomplete transcripts from being read by concurrent follow-up messages. Main risk is around ensuring all execution paths that register an active run also clear it; the new `queueHandle` guard makes that relationship a bit more implicit and worth double-checking.
- src/agents/pi-embedded-runner/run/attempt.ts
<!-- greptile_other_comments_section -->
<sub>(4/5) You can add custom instructions or style guidelines for the agent [here](https://app.greptile.com/review/github)!</sub>
**Context used:**
- Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))
- Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13))
<!-- /greptile_comment -->
Most Similar PRs
#9171: Fix: Route tool result deliveries through BlockReplyPipeline for pr...
by vishaltandale00 · 2026-02-04
81.2%
#9861: fix(agents): re-run tool_use/tool_result repair after limitHistoryT...
by CyberSinister · 2026-02-05
81.0%
#15996: fix(agents): messages arrive out of order — tool output beats narra...
by yinghaosang · 2026-02-14
80.3%
#13282: fix(agents): instruct agent not to retry lost tool results
by thebtf · 2026-02-10
80.1%
#2541: fix(agents): add error handling to orphaned message cleanup
by Episkey-G · 2026-01-27
80.1%
#4009: fix(agent): sanitize messages after orphan user repair
by drag88 · 2026-01-29
79.8%
#4922: fix(agents): ensure parallel tool results have correct parentId
by jduartedj · 2026-01-30
79.4%
#17743: fix(agents): disable orphaned user message deletion that causes ses...
by clawrl3000 · 2026-02-16
79.3%
#21195: fix: suppress orphaned tool_use/tool_result errors after session co...
by ruslansychov-git · 2026-02-19
78.6%
#5593: fix(pi-embedded-runner): force-clear stuck embedded runs after timeout
by grassX1998 · 2026-01-31
78.5%