#11210: Harden Tool-Call Streaming Parser for OpenAI-Completions Backends (gpt-oss-120b + LiteLLM + LMDeploy)

by ga-it open 2026-02-07 15:04 View on GitHub →

stale

# PR: Harden Tool-Call Streaming Parser for OpenAI-Completions Backends (LiteLLM + LMDeploy) ## Summary This PR adds a defensive tool-call parsing guard for streamed `openai-completions` responses used by OpenClaw via `@mariozechner/pi-ai`. It is required because some backends (notably `litellm -> lmdeploy -> gpt-oss-120b`) can emit tool-call deltas in variants that cause malformed/partial argument assembly, resulting in repeated empty tool arguments (`{}`), validation loops, and degraded agent behavior. ## Why This Is Needed OpenClaw’s tool execution depends on complete, valid structured tool-call payloads. When tool-call argument reconstruction fails during streaming: - `read` is called with `{}` instead of `{ "path": ... }` - `exec` is called with `{}` instead of `{ "command": ... }` - agent loops on retries and may produce gibberish / fallback replies This PR prevents malformed tool blocks from propagating to execution and broadens compatibility with real-world streaming variants. ## Related Issues - `openclaw/openclaw#9916` - https://github.com/openclaw/openclaw/issues/9916 - `openclaw/openclaw#10507` https://github.com/openclaw/openclaw/issues/10507 - `openclaw/openclaw#9956` https://github.com/openclaw/openclaw/issues/9956 - `openclaw/openclaw#7867` https://github.com/openclaw/openclaw/issues/7867 - `badlogic/pi-mono#952` - https://github.com/badlogic/pi-mono/issues/952 ## Affected Runtime Path - File patched at runtime package level: - `dist/providers/openai-completions.js` (inside `@mariozechner/pi-ai`) - Patch file in this repo: - `patches/pi-ai-0.52.6-toolcall-json-guard.patch` ## Root Cause (Observed) Some streamed responses include tool/function deltas in forms not fully handled by the original parser path: - alternate argument fields (`arguments`, `parsed_arguments`, `args`, `input`) - `response.function_call_arguments.delta` and `...done` event types - tool-call `id` updates/churn mid-stream - final blocks where `name` is missing while partial args exist Without guards, these edge cases leave `partialArgs` unparseable or unbound to the right tool block. ## What This Patch Changes ### 1) Adds robust argument extraction and parsing helpers - `extractRawToolArgs(toolCall)` - `recoverFirstJsonObject(raw)` - `parseToolArgsSafely(raw, fallbackArgs)` - `applyRawToolArgs(block, rawToolArgs)` - `findToolCallBlock(blocks, callId)` ### 2) Handles additional streaming event variants Adds handling for: - `response.function_call_arguments.delta` - `response.function_call_arguments.done` This allows tool-call argument state to be updated from backend variants outside the basic `choice.delta.tool_calls` path. ### 3) Improves tool-call block continuity - Avoids unnecessary new tool block creation when only `id` changes and no new tool name is present. - Accepts late-arriving `choice.message.tool_calls` hydration into existing blocks. ### 4) Prevents malformed tool blocks from execution - On tool block finalization: drop block if parse fails or tool `name` is blank. - Final safety-net pass before `done`: strip any nameless `toolCall` blocks. This ensures invalid tool calls do not reach OpenClaw’s tool validator/executor. ## Key Logic (High Level) ```mermaid flowchart TD A[Stream chunk received] --> B{Chunk type} B -->|tool delta variants| C[Extract raw args from known fields] B -->|choice.delta.tool_calls| D[Update/create tool block] B -->|choice.message.tool_calls| E[Hydrate existing block by id/name] C --> F[Append/replace partialArgs] D --> F E --> F F --> G[Parse args safely] G -->|ok| H[Emit toolcall_delta/end] G -->|bad or nameless| I[Drop malformed block] H --> J[Continue stream] I --> J J --> K[Final safety-net: remove nameless tool blocks] K --> L[Emit done] ``` ## Before vs After Behavior ```mermaid sequenceDiagram participant U as User participant A as OpenClaw Agent participant P as Parser (@mariozechner/pi-ai) participant T as Tool Runtime U->>A: "Use weather skill" A->>P: Stream chat completion P-->>A: toolCall(read, arguments={}) (before) A->>T: read {} T-->>A: validation error (missing path) A->>P: retry P-->>A: toolCall(read, arguments={}) Note over A,P: loop / degraded response rect rgb(220,245,220) U->>A: "Use weather skill" A->>P: Stream chat completion P-->>A: toolCall(read, arguments={"path":"/app/skills/weather/SKILL.md"}) (after) A->>T: read {path: ...} T-->>A: success A->>T: exec {command: ...} T-->>A: success end ``` ## Validation Results (Container Runtime) Environment validated: - Container: `Openclaw-1` - Provider/API/Model: - provider: `litellm` - api: `openai-completions` - model: `ga3/gpt-oss-120b` Patch markers present in runtime file: - `toolNameDelta = toolCall.function?.name || ""` - `Final safety net: strip malformed tool calls before sending \\`done\\`.` Latest session analyzed: - `/home/node/.openclaw/agents/main/sessions/59214c76-2d15-447d-87b7-9ac6dd2fd90b.jsonl` Metrics: - Total tool calls: `59` - Empty-args tool calls: `32` - Validation errors: - `read.path` missing: `26` - `exec.command` missing: `6` Stabilization split (same session): - Pre-reset segment: `44` tool calls, `32` empty (`72.7%` empty) - Post-reset segment: `15` tool calls, `0` empty (`0%` empty) Observed successful post-stabilization flow: - `read /app/skills/weather/SKILL.md` (valid args) - `exec curl ... wttr.in ...` (valid args) - `read /home/node/common-skills/gitea/SKILL.md` (valid args) - `exec tea repos list` + login/env-driven gitea flow ## Scope and Safety ### Scope - Parsing/assembly logic for tool-call deltas in streaming path only. - No change to business logic of OpenClaw tools themselves. ### Safety controls in patch - Defensive parsing with fallback recovery. - Drop invalid or nameless tool blocks before execution. - Preserve existing valid behavior for standard OpenAI-compatible streams. ## Risk Assessment Potential risks: - Over-filtering a valid but unusual tool block if name is absent too long. - Behavioral differences for providers that rely on nonstandard delayed naming. Mitigations: - Name is updated from multiple event paths (`delta`, `message.tool_calls`). - Filtering is intentionally final-stage and targeted at malformed blocks. ## Test / Repro Guidance ### Repro prompt set 1. `What is weather in Johannesburg, South Africa today? use your weather skill` 2. `use your gitea skill to list repos you can see` ### Expected with fix - No repeated `{}` arguments in tool calls. - `read` includes `path`. - `exec` includes `command`. - Tool failures, if any, should be setup/auth errors, not parser-shape errors. ## Rollout Recommendation 1. Merge parser hardening patch. 2. Keep session reset (`/reset`) in troubleshooting SOP when validating new deployments. 3. Add regression tests at parser layer for: - function_call delta variants - id churn - partial JSON recovery - nameless-tool block stripping ## Notes - This patch is a pragmatic compatibility hardening for mixed provider chains. - Long term, ideal resolution is in upstream `pi-ai` source + release, then consume via dependency bump.  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adds a patch applied to `@mariozechner/pi-ai`’s `dist/providers/openai-completions.js` to harden streamed tool-call parsing for OpenAI-completions backends. Key changes include: extracting tool-call arguments from multiple possible fields, safer JSON parsing/recovery for assembled argument strings, handling additional streaming event types (`response.function_call_arguments.delta/done`), improving continuity when tool-call IDs churn, and dropping malformed/nameless tool-call blocks before emitting `toolcall_end` and before the final `done` event. These changes are confined to the streaming parser path and are intended to prevent empty `{}` tool arguments from reaching OpenClaw’s tool validator/executor. <h3>Confidence Score: 4/5</h3> - This PR looks safe to merge and is narrowly scoped to defensive parsing in the streaming tool-call path. - The change is contained to a runtime patch file and adds guardrails around tool-call argument assembly and finalization, including explicit dropping of malformed/nameless tool calls. I didn’t find a definitive logic error in the introduced conditions from the patch alone, but the patch touches subtle streaming state, so it merits careful runtime validation across providers. - patches/pi-ai-0.52.6-toolcall-json-guard.patch  <sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>