#13235: feat: stream reasoning_content via /v1/chat/completions SSE

by mode80 open 2026-02-10 07:09 View on GitHub →

gateway agents stale

## Summary Emit assistant thinking/reasoning content through the `/v1/chat/completions` SSE streaming endpoint as `delta.reasoning_content`, enabling custom clients to surface model reasoning in their UI. ## Problem The chatCompletions endpoint currently only forwards `delta.content` (assistant text) and lifecycle events. Thinking/reasoning content from models like Claude Opus (extended thinking) is extracted internally but never exposed to SSE consumers. Custom clients that implement reasoning sidebars (similar to the Anthropic Console or ChatGPT) have no way to access this data. ## Changes **`src/agents/pi-embedded-subscribe.handlers.messages.ts`** - At `handleMessageEnd`, unconditionally extract thinking content and emit it to the agent event bus as a `"reasoning"` stream event - This fires regardless of whether channel-level reasoning mode (`includeReasoning` / `streamReasoning`) is enabled — it is purely bus data for HTTP/WS consumers - Reuses the existing `extractAssistantThinking` / `extractThinkingFromTaggedText` helpers (no new extraction logic) **`src/gateway/openai-http.ts`** - Listen for `"reasoning"` events on the agent event bus - Emit SSE chunks with `delta: { reasoning_content: text }` (matches the DeepSeek/OpenAI-compatible convention) ## Behavior - Reasoning content is emitted once per assistant turn (at message_end), containing the full thinking text for that turn - This is NOT token-by-token streaming of reasoning deltas — that would require deeper changes to the subscriber/provider layer and is left as a follow-up - Existing behavior is unchanged: channel-level reasoning modes (`on`, `stream`, `include`) continue to work as before - No new config required — reasoning flows automatically when the model produces thinking blocks ## Testing - All 18 existing reasoning/thinking-related tests pass - Type-checks clean (`tsgo --noEmit`) - 2 files changed, ~45 lines added ## Follow-up Token-by-token streaming of reasoning deltas (as they arrive from the model, before message_end) would require intercepting thinking block deltas in the pi-agent subscriber layer. Tracked separately.  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adds a new agent-bus stream event (`stream: "reasoning"`) emitted at assistant `message_end` containing the full extracted thinking text for the turn, and wires the OpenAI-compatible `/v1/chat/completions` SSE gateway to forward those events as `choices[].delta.reasoning_content` chunks. The main integration points are `src/agents/pi-embedded-subscribe.handlers.messages.ts` (extracting thinking via existing helpers and emitting it on the event bus) and `src/gateway/openai-http.ts` (subscribing to bus events and serializing them into SSE `chat.completion.chunk` payloads alongside normal `delta.content`). <h3>Confidence Score: 3/5</h3> - This PR is mergeable only if always-on reasoning exposure is intended. - The implementation is small and localized, but it materially changes what data is emitted to the agent event bus and therefore to `/v1/chat/completions` SSE consumers: reasoning/thinking is now sent even when channel-level reasoning is disabled. If that opt-in/opt-out contract matters, this is a correctness/privacy behavior change that should be gated or explicitly documented/controlled. - src/agents/pi-embedded-subscribe.handlers.messages.ts; src/gateway/openai-http.ts  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>