#15996: fix(agents): messages arrive out of order — tool output beats narration (#15968)
agents
stale
size: M
Cluster:
Tool Result Handling Improvements
## Summary
Block reply flush (`onBlockReplyFlush`) wasn't awaited before tool execution, so narration text could arrive after the tool's message send — inverted ordering on channels like BlueBubbles/iMessage.
Closes #15968
lobster-biscuit
## Root Cause
Two fire-and-forget calls in the event pipeline:
1. `handleToolExecutionStart` in `pi-embedded-subscribe.handlers.tools.ts` called `void ctx.params.onBlockReplyFlush()` — the flush kicked off but nobody waited for it.
2. `createEmbeddedPiSessionEventHandler` in `pi-embedded-subscribe.handlers.ts` called `handleToolExecutionStart(...).catch(...)` without `await`, so the tool execution proceeded in parallel with the flush.
The result: the tool's HTTP call (e.g. message send) often completed before the flush's HTTP call (narration delivery), flipping the message order.
## Changes
- Before: `onBlockReplyFlush` is fire-and-forget; tool execution races the flush
- After: events are serialized through a promise chain when a flush is in-flight, so the flush completes before subsequent events are processed
The promise chain only activates when `onBlockReplyFlush` is present and a `tool_execution_start` fires. All other events take the direct (synchronous) path — zero overhead for the common case.
## Reviewer Notes
- The duplicated `switch` block in `handlers.ts` (direct path vs chain path) is intentional — it avoids wrapping every event in a microtask when no flush is needed. The direct path preserves the original synchronous behavior exactly.
- `pendingCount` tracks in-flight chain items. When it drops to 0, subsequent events skip the chain entirely. This means the chain overhead is scoped to the flush window only.
- The `try/catch` around `await ctx.params.onBlockReplyFlush()` in `handleToolExecutionStart` is best-effort — a failed flush shouldn't block tool execution. The pipeline already has its own timeout (`BLOCK_REPLY_SEND_TIMEOUT_MS`).
- Existing tests that asserted synchronous flush behavior now `await flushMicrotasks()` after `tool_execution_start` events, since the handler processes them on the promise chain.
- I left `handleToolExecutionEnd` as fire-and-forget in the direct path (no chain needed there) — after-tool hooks don't need to block subsequent events.
## Tests
- `pi-embedded-subscribe.block-flush-ordering.integration.test.ts` — 3 new tests: async flush blocks subsequent events, flush completes before tool proceeds (HTTP race), rejected flush doesn't break the chain. All fail before fix, pass after.
- Updated `calls-onblockreplyflush-before-tool-execution-start-preserve.e2e.test.ts` — added `await flushMicrotasks()` for async chain behavior.
- All 67 pi-embedded-subscribe e2e tests pass (only pre-existing `https-proxy-agent` dep failure in unrelated `tools.e2e.test.ts`).
- `pnpm build && pnpm check` pass (pre-existing `https-proxy-agent` issue in `discord/monitor/gateway-plugin.ts` only).
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR fixes a race condition where narration text could arrive after tool execution messages on channels like iMessage/BlueBubbles due to `onBlockReplyFlush` not being awaited. The fix introduces a promise chain that serializes events when a flush is in-flight, ensuring proper message ordering.
**Key Changes:**
- `handlers.tools.ts`: Changed `void ctx.params.onBlockReplyFlush()` to `await ctx.params.onBlockReplyFlush()` with try/catch for best-effort error handling
- `handlers.ts`: Added promise chain mechanism that activates only when `onBlockReplyFlush` is present and `tool_execution_start` fires, preserving zero overhead for the common case
- Added comprehensive integration tests covering async flush, HTTP race conditions, and rejected flush scenarios
**Implementation:**
The solution uses a conditional dual-path approach: events take the direct (synchronous) path when no flush is needed, or go through a promise chain when `pendingCount > 0` or when a `tool_execution_start` event occurs with `onBlockReplyFlush` defined. The `pendingCount` tracker ensures the chain only operates during the flush window.
<h3>Confidence Score: 4/5</h3>
- Safe to merge with minimal risk - the fix correctly addresses the race condition while preserving existing behavior through a well-tested dual-path approach
- Score reflects solid implementation with comprehensive test coverage (3 new integration tests plus updated existing tests), clear root cause analysis, and thoughtful design that minimizes overhead. The duplicated switch block adds maintenance burden but is intentional and documented. Error handling is appropriate (best-effort flush shouldn't block execution). All 1311 tests pass.
- No files require special attention - the implementation is straightforward and well-documented
<sub>Last reviewed commit: ebc7178</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#15920: fix: await coalescer flush before tool execution
by Bridgerz · 2026-02-14
92.1%
#22886: fix: await onBlockReplyFlush before tool execution to preserve mess...
by botverse · 2026-02-21
87.6%
#9171: Fix: Route tool result deliveries through BlockReplyPipeline for pr...
by vishaltandale00 · 2026-02-04
81.1%
#7760: fix(agents): resolve message ordering conflict during tool execution
by aryan877 · 2026-02-03
80.3%
#19632: fix: suppressToolErrors now suppresses exec tool failure notifications
by Gitjay11 · 2026-02-18
80.3%
#18187: fix: tool summaries silently dropped when reasoningLevel is stream
by ayanesakura · 2026-02-16
80.1%
#17552: fix(agents): suppress tool error warnings when assistant already re...
by AytuncYildizli · 2026-02-15
79.5%
#19518: fix: await block reply flush before tool execution
by katalabut · 2026-02-17
79.4%
#19235: fix(telegram): tool error warnings no longer overwrite streamed rep...
by gatewaybuddy · 2026-02-17
79.3%
#14328: fix: strip incomplete tool_use blocks from errored/aborted messages...
by Kropiunig · 2026-02-12
78.4%