#16677: feat(routing): intelligent model routing via pre-route hook

by gonesurfing open 2026-02-15 00:44 View on GitHub →

docs app: web-ui gateway agents size: L

AI Assistance used in this PR. Human testing performed on both local and cloud routing endpoints. ## Summary - **Problem:** Users pay premium model prices for simple messages ("hey", "ok", "read that file") because there's no automatic complexity-based routing. Manual `/model` switching is tedious and error-prone. - **Why it matters:** Simple messages can be 90%+ of volume in some workflows. Routing them to cheap/fast models saves significant cost without degrading quality on complex tasks. - **What changed:** Added a pre-route hook that classifies each message with a lightweight LLM (~300 tokens) and routes to a configured model tier before the agent runs. Supports local Ollama and any OpenAI-compatible API. UI shows routed model as a badge. Classification prompt is user-editable at `~/.openclaw/router/ROUTER.md`. - **What did NOT change (scope boundary):** No changes to the agent execution pipeline itself, model provider resolution, auth system, or any existing routing logic. The router is a pure pre-hook that optionally overrides the model ref before the agent starts. Disabled by default. ## Change Type (select all) - [ ] Bug fix - [x] Feature - [ ] Refactor - [x] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [x] API / contracts - [x] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #9402 - Related #4658, #10969, #15033 ## User-visible / Behavior Changes - New `router` config section (disabled by default — no behavior change unless explicitly enabled) - When enabled, messages are classified and routed to tier-mapped models automatically - Routed model shown as a badge in webchat UI (e.g. `🔀 anthropic/claude-haiku-4.5`) - `/status` command shows router configuration when enabled - New external file `~/.openclaw/router/ROUTER.md` created on first use for classification prompt tuning - `router.apiKey` is redacted in config logs/exports via `.register(sensitive)` ## Security Impact (required) - New permissions/capabilities? `No` - Secrets/tokens handling changed? `Yes` - New/changed network calls? `Yes` - Command/tool execution surface changed? `No` - Data access scope changed? `No` - If any `Yes`, explain risk + mitigation: - **router.apiKey**: New optional secret for the classifier API. Marked `.register(sensitive)` in Zod schema so it's redacted in config dumps. Supports `env:VAR_NAME` syntax to avoid plaintext in config files. - **Network calls**: Router makes HTTP POST to either local Ollama (`http://localhost:11434`) or a configured OpenAI-compatible endpoint. Only the user's message text + classification prompt are sent (~300 tokens). No session data, auth tokens, or PII beyond the message content. Calls are bounded by `timeoutMs` (default 10s). ## Repro + Verification ### Environment - OS: Linux (Jetson / ARM64) + macOS (dev) - Runtime/container: Docker (openclaw-gateway) - Model/provider: Ollama (qwen3:4b) + OpenRouter (qwen/qwen3-4b) - Integration/channel: Webchat - Relevant config (redacted): ```json { "router": { "enabled": true, "provider": "openai-compatible", "baseUrl": "https://openrouter.ai/api/v1", "apiKey": "env:OPENROUTER_API_KEY", "model": "qwen/qwen3-4b", "tiers": { "1": "openrouter/minimax/minimax-m2.1", "2": "openrouter/anthropic/claude-haiku-4.5", "3": "openrouter/anthropic/claude-opus-4.6" }, "defaultTier": "1" } } ``` Steps 1. Enable router in config with tiers mapped to models 2. Send messages of varying complexity: "hey", "fix this TypeError", "design a microservices architecture" 3. Observe routed model badge in webchat and gateway logs Expected - "hey" → tier 1 (cheap model), badge shows tier 1 model - "fix this TypeError" → tier 2 (code model) - "design a microservices architecture" → tier 3 (premium model) - Fallback to defaultTier on classifier error Actual - All tiers route correctly with ~200-700ms classifier latency - Badge renders and persists on completed messages - Fallback works on timeout/malformed response Evidence - Failing test/log before + passing after - Trace/log snippets - Screenshot/recording - Perf numbers (if relevant) Unit tests in src/hooks/pre-route.test.ts cover: config resolution, tier parsing (1/2/3), fallback on error/timeout, model ref parsing, Ollama + OpenAI-compatible payloads, conversation context injection. Human Verification (required) - Verified scenarios: Tier 1/2/3 classification with live Ollama and OpenRouter endpoints; fallback on Ollama timeout; badge rendering in webchat; badge persistence on completed messages; /status output; ROUTER.md hot-reload; heartbeat/session-reset bypass - Edge cases checked: Classifier returns unexpected text (falls back); classifier returns tier not in config (falls back); ROUTER.md missing (uses built-in default); apiKey with env: prefix resolves from environment - What you did not verify: Concurrent routing under high load; interaction with agent-level model overrides; non-webchat channels (Telegram, WhatsApp, etc.) Compatibility / Migration - Backward compatible? Yes — router is disabled by default, no behavior change without opt-in - Config/env changes? Yes — new optional router section in config - Migration needed? No Failure Recovery (if this breaks) - How to disable/revert this change quickly: Set "router": { "enabled": false } in config, or remove the router section entirely. No restart needed for ROUTER.md changes; config change requires restart. - Files/config to restore: Only the router config key. No database or state changes. - Known bad symptoms reviewers should watch for: Elevated latency on every message (~200-700ms from classifier call); classifier endpoint returning errors flooding logs (mitigated: errors are caught and fall back silently) Risks and Mitigations - Risk: Classifier adds latency to every message when enabled - Mitigation: Bounded by timeoutMs (default 10s, recommend 3-5s). Falls back to defaultTier on timeout. Latency is amortized into agent startup. - Risk: User message content sent to classifier endpoint (privacy) - Mitigation: Only the message text is sent, no metadata/auth. Local Ollama mode keeps data on-device. Documented so users can make informed choices. - Risk: Router model naming inconsistency (bare API names vs OpenClaw refs) - Mitigation: Documented as known limitation. Future work to support standard model refs with auto-credential resolution.  <h3>Greptile Summary</h3> This PR adds intelligent pre-route model selection that classifies user messages using a lightweight LLM classifier before the main agent runs. Messages are classified into tiers (casual/code/complex) and routed to appropriate models to optimize cost and latency. The feature is disabled by default and well-encapsulated. **Key changes:** - New `src/hooks/pre-route.ts` with routing logic supporting Ollama and OpenAI-compatible APIs - Configuration schema extended with optional `router` section (validated with refinement) - Integration at agent execution entry point with context extraction from session history - UI badge display showing routed model in webchat - `/status` command shows router configuration when enabled - Comprehensive test coverage for routing logic **Previous concerns addressed:** - `baseUrl` guard now returns early before calling `callOpenAICompatible` (line 293-301) - Config schema uses optional fields + `.superRefine` to require `tiers`/`defaultTier` only when enabled **Architecture:** - Graceful fallback on all error paths (timeout, network, parse failures) - Bypass for heartbeats and system prompts - No changes to core agent pipeline or provider resolution - Context truncation prevents token bloat (500 chars message, 200 chars context) **Security:** - `apiKey` marked `.register(sensitive)` for redaction - Supports `env:VAR_NAME` syntax to avoid plaintext secrets - Only user message text sent to classifier (no session metadata or PII beyond message content) - Bounded by configurable timeout <h3>Confidence Score: 4/5</h3> - Safe to merge with minor considerations around classification accuracy and latency impact in production - Well-architected feature with comprehensive error handling, graceful fallbacks, and good test coverage. Both previous thread concerns have been addressed. The implementation is properly scoped as a pre-hook with no changes to core agent logic. Feature is disabled by default, minimizing deployment risk. Deducting one point for: (1) classification accuracy is dependent on external LLM quality and prompt tuning, (2) adds 200-700ms latency to every message when enabled, and (3) limited verification on non-webchat channels per PR description - No files require special attention - previous concerns in `src/hooks/pre-route.ts` and `src/config/zod-schema.ts` have been addressed <sub>Last reviewed commit: 561d207</sub>