#15402: fix(agent.wait): return deterministic final assistant text to avoid history races
gateway
agents
stale
size: M
Cluster:
Subagent Enhancements and Features
## 背景 / Background
当前链路里,`sessions_send` / 子会话步骤在 `agent.wait` 返回 `ok` 后,仍依赖 `chat.history` 读取“最后一条 assistant 消息”。在高并发或事件时序下,这一步可能读到旧消息,导致:
1. 已完成 run 被误判为未拿到结果(stale read)
2. announce/reply 步骤读到前一轮内容
3. 上层看起来像“卡住”或“继续执行不稳定”
In the current flow, `sessions_send` and subagent step logic call `chat.history` after `agent.wait=ok` to fetch the latest assistant reply. Under event ordering races, this can return stale transcript data and produce unstable behavior.
## 变更 / What changed
1. `agent.wait` 底层新增可选返回字段 `finalAssistantText`
2. `waitForAgentJob` 监听并缓存同一 `runId` 的最新 `assistant` 文本,在生命周期结束时写入快照
3. `sessions_send` 主路径优先使用 `agent.wait.finalAssistantText`,缺失时才回退 `chat.history`
4. A2A announce 流程和 `runAgentStep` 同样优先使用 `finalAssistantText`
5. 补充单测覆盖 deterministic 回传和 fallback 逻辑
Added an optional `finalAssistantText` field on `agent.wait`, populated from assistant events for the same `runId`. `sessions_send` / A2A / `runAgentStep` now prefer this deterministic value and only fall back to `chat.history` when absent.
## 兼容性 / Compatibility
- 新字段是可选字段,未使用该字段的调用方不受影响。
- 保留原有 `chat.history` fallback,兼容未产出 assistant 事件的场景。
The new field is optional and backward compatible. Existing callers continue to work, with history fallback preserved.
## 测试 / Tests
- `pnpm vitest src/gateway/server-methods/agent-job.test.ts`
- `pnpm vitest src/agents/tools/agent-step.test.ts`
- `pnpm vitest src/agents/openclaw-tools.sessions.test.ts`
- `pnpm vitest run --config vitest.e2e.config.ts src/gateway/server.sessions-send.e2e.test.ts`
All passed locally.
## 关联 / Link
- Related: #14046
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR adds an optional `finalAssistantText` field to `agent.wait` responses and updates session-send / A2A / `runAgentStep` to prefer this deterministic value over reading the latest assistant message from `chat.history`, reducing stale-history races after `agent.wait` returns `ok`.
The value is populated in the gateway by listening for `assistant` agent events for a given `runId`, caching the latest text seen, and snapshotting it into the run’s lifecycle-end record so later `agent.wait` calls can return it without touching history.
<h3>Confidence Score: 4/5</h3>
- Mostly safe to merge, but the new deterministic reply field won’t work for delta-only assistant event streams.
- Core change is additive and keeps history fallback, so it won’t break existing callers. However, the gateway-side capture only looks at `evt.data.text`, while the codebase also emits assistant events as `delta` chunks in streaming paths; those runs will never populate `finalAssistantText`, reducing the intended determinism for common streaming scenarios.
- src/gateway/server-methods/agent-job.ts
<sub>Last reviewed commit: 2e8a838</sub>
<!-- greptile_other_comments_section -->
<sub>(5/5) You can turn off certain types of comments like style [here](https://app.greptile.com/review/github)!</sub>
<!-- /greptile_comment -->
## 与 #15383 的关系 / Relation to #15383
这是同一问题的“底层确定性解法(infrastructure-level fix)”。
- 本 PR(#15402)让 `agent.wait` 返回可选 `finalAssistantText`,调用方可优先使用确定结果,降低 `chat.history` 时序竞态。
- 互补 PR(#15383)在 announce 投递前增加上层防护,作为快速缓解层。
### 为什么拆成两个 PR / Why split into two PRs
1. 变更层级不同:#15402 改的是网关等待结果契约;#15383 改的是 A2A announce 业务防护。
2. 评审关注点不同:一个偏 API/并发语义,一个偏业务行为防抖。
3. 便于维护者选择:可单独接受高层或底层方案,也可组合合并。
### 结论 / Outcome
两者都针对 #14046;单独合并任一条都能改善,组合合并效果最佳。
Related: #14046, #15383
Most Similar PRs
#15383: fix(sessions_send): avoid announce delivery when announce step reso...
by Zjianru · 2026-02-13
87.5%
#16949: fix(gateway): deliver chat:final even when sessionKey is unresolved (…
by ekleziast · 2026-02-15
73.0%
#12974: fix: intermittent (no output) reported by users
by vincentkoc · 2026-02-10
70.1%
#21828: fix: acquire session write lock in delivery mirror and gateway chat...
by inkolin · 2026-02-20
69.8%
#15792: fix: pass agentId to resolveSessionFilePath in additional call sites
by MisterGuy420 · 2026-02-13
69.5%
#10273: fix(agents): detect and auto-compact mid-run context overflow
by terryops · 2026-02-06
69.4%
#4495: Fix: emit final assistant event when reply tags hide stream
by ukeate · 2026-01-30
68.9%
#15982: fix: pass agentId to resolveSessionFilePath in reply flow (NX-003)
by automagik-genie · 2026-02-14
68.9%
#9085: fix: improve stability for terminated responses and telegram retries
by vladdick88 · 2026-02-04
68.7%
#19412: fix(status): prefer configured contextTokens over session entry
by rafaelipuente · 2026-02-17
68.3%