#19326: Agents: improve z.ai GLM-5 integration and failover

by gabrielespinheira open 2026-02-17 17:45 View on GitHub →

commands agents size: L

Cluster: Wizard Enhancements and Config Fixes

## Summary Describe the problem and fix in 2–5 bullets: - Problem: z.AI GLM integration had multiple rough edges vs Opus-grade behavior (endpoint probing fallback, thinking-level mismatch handling, tool loop resilience, model capability metadata, and compaction/auth retry ergonomics). - Why it matters: these issues increased failure rates (or avoidable retries), especially in long tool-heavy sessions where users expect stable behavior with lower-cost GLM models. - What changed: improved GLM endpoint/model handling, provider-aware thinking/tool-loop/auth cooldown behavior, GLM-focused skills prompt compaction, and additional safety/observability tests (including live z.AI probes). - What did NOT change (scope boundary): no dependency patches, no broad architecture refactor, no unrelated channel behavior changes. ## Change Type (select all) - [x] Bug fix - [x] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [x] Skills / tool execution - [x] Auth / tokens - [ ] Memory / storage - [x] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes # - Related # ## User-visible / Behavior Changes - `zai-endpoint-detect` now probes `glm-5` on coding endpoints before downgrading to `glm-4.7`. - GLM-5 model compatibility now consistently forces `supportsDeveloperRole=false` and marks GLM-5 as image-capable where catalog metadata lags. - Provider-aware thinking normalization now maps non-`off` thinking levels to `low` for z.AI to avoid unsupported-level retries. - GLM runs get tighter default skills prompt limits and model-aware prompt rebuild from stored snapshots. - GLM tool sessions get safer default loop-detection thresholds unless explicitly overridden. - z.AI rate-limit/timeout auth-profile cooldown backoff is tuned to recover faster while preserving default behavior for non-rate failures. ## Security Impact (required) - New permissions/capabilities? (`No`) - Secrets/tokens handling changed? (`No`) - New/changed network calls? (`No`) - Command/tool execution surface changed? (`No`) - Data access scope changed? (`No`) - If any `Yes`, explain risk + mitigation: ## Repro + Verification ### Environment - OS: macOS (arm64) - Runtime/container: Node 25.x, pnpm, Vitest - Model/provider: z.AI (`glm-5`, `glm-4.7`) - Integration/channel (if any): embedded agent runtime + gateway model probes - Relevant config (redacted): local `~/.openclaw/openclaw.json` with z.AI credentials ### Steps 1. Run targeted unit/e2e suites for changed GLM/auth/skills/tool-loop paths. 2. Run live z.AI test suite (`src/agents/zai.live.test.ts`). 3. Validate endpoint detect fallback behavior and overflow compaction regression coverage. ### Expected - GLM-5 path succeeds without avoidable downgrade/retry loops. - Tool loops are guarded with sane defaults for GLM. - Skills prompt remains compact and stable for GLM runs. - Auth rotation recovers from z.AI rate limits more smoothly. ### Actual - All targeted unit/e2e suites passed. - Live z.AI probes passed (`glm-5` text, `glm-5` tool call, `glm-4.7` text). ## Evidence Attach at least one: - [x] Failing test/log before + passing after - [x] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) Evidence snippets from local runs: - `pnpm vitest run --config vitest.e2e.config.ts ...` → 11 files / 86 tests passed - `pnpm vitest run --config vitest.unit.config.ts ...` → 4 files / 43 tests passed - `ZAI_LIVE_TEST=1 pnpm vitest run --config vitest.live.config.ts src/agents/zai.live.test.ts` → 3 passed / 1 skipped ## Human Verification (required) What you personally verified (not just CI), and how: - Verified scenarios: - endpoint detection prefers `glm-5` where supported - z.AI thinking normalization avoids unsupported-level retry churn - GLM snapshot skills prompt rebuild + compact limits - GLM loop-detection defaults and config merge behavior - z.AI rate-limit cooldown tuning - live `glm-5` tool-call path - Edge cases checked: - explicit compat false still preserves GLM-5 image capability - compaction path resilience when compact helper returns empty result - mock coverage updated for new helper exports - What you did **not** verify: - full repository test matrix / full CI runtime - production traffic behavior across all providers/channels ## Compatibility / Migration - Backward compatible? (`Yes`) - Config/env changes? (`No`) - Migration needed? (`No`) - If yes, exact upgrade steps: ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: - revert this PR commit - temporarily pin to non-GLM provider/model in agent defaults - override loop-detection/auth cooldown values in config if needed - Files/config to restore: - `src/agents/pi-tools.ts` - `src/agents/auth-profiles/usage.ts` - `src/agents/skills/workspace.ts` - `src/agents/model-compat.ts` - Known bad symptoms reviewers should watch for: - unexpected aggressive loop blocks for custom tool flows - provider mismatch if non-z.AI models are accidentally treated as GLM - compaction retry behavior regressions in overflow paths ## Risks and Mitigations List only real risks for this PR. Add/remove entries as needed. If none, write `None`. - Risk: GLM loop thresholds may be too strict for some long polling workflows. - Mitigation: thresholds are still config-overridable; merge logic preserves explicit user config. - Risk: provider/model heuristics (`zai` + `glm-*`) might not match future naming. - Mitigation: scoped helper checks + tests; fallback behavior remains existing defaults when no match.  <h3>Greptile Summary</h3> Improved z.AI GLM integration with better endpoint detection, provider-aware defaults, and auth retry tuning. **Major changes:** - Endpoint detection now probes `glm-5` on coding endpoints before falling back to `glm-4.7` - Provider-aware thinking normalization maps non-`off` thinking levels to `low` for z.AI to avoid unsupported-level retries - GLM models get tighter loop detection defaults (`warningThreshold: 6`, `criticalThreshold: 10`) with config merge logic that preserves explicit user overrides - Skills prompt compaction for GLM models (80 skills/16k chars vs 150 skills/30k chars default), with snapshot rebuild logic for GLM runs - z.AI rate-limit/timeout cooldown uses faster backoff (20s, 60s, 3m, 9m vs default geometric progression) - Tool stream enabled by default for GLM-5 models (opt-out via `tool_stream: false` param) - GLM-5 forward-compat fallback includes vision capability (`input: ["text", "image"]`) and correct context limits - Defensive null check for compaction result before accessing properties - Tool start metadata isolation by `runId::toolCallId` composite key to prevent cross-run collisions - Media URL deduplication when committing messaging tool sends **Testing:** Comprehensive test coverage includes endpoint detection fallback paths, loop detection config merging, skills prompt rebuilding for GLM, thinking normalization, auth cooldown tuning, tool metadata isolation, and live z.AI probes. <h3>Confidence Score: 4/5</h3> - Safe to merge with minor edge case considerations - Comprehensive changes with thorough test coverage across all modified paths. The PR addresses real integration issues with z.AI GLM models and includes defensive programming patterns. Minor deduction due to the broad scope touching multiple critical paths (auth, loop detection, skills, endpoint detection) and potential for provider heuristics to need future adjustments. - No files require special attention - changes are well-tested and follow established patterns <sub>Last reviewed commit: 12ab0d7</sub>  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>