#16677: feat(routing): intelligent model routing via pre-route hook
docs
app: web-ui
gateway
agents
size: L
Cluster:
Model Management Enhancements
AI Assistance used in this PR. Human testing performed on both local and cloud routing endpoints.
## Summary
- **Problem:** Users pay premium model prices for simple messages ("hey", "ok", "read that file") because there's no automatic complexity-based routing. Manual `/model` switching is tedious and error-prone.
- **Why it matters:** Simple messages can be 90%+ of volume in some workflows. Routing them to cheap/fast models saves significant cost without degrading quality on complex tasks.
- **What changed:** Added a pre-route hook that classifies each message with a lightweight LLM (~300 tokens) and routes to a configured model tier before the agent runs. Supports local Ollama and any OpenAI-compatible API. UI shows routed model as a badge. Classification prompt is user-editable at `~/.openclaw/router/ROUTER.md`.
- **What did NOT change (scope boundary):** No changes to the agent execution pipeline itself, model provider resolution, auth system, or any existing routing logic. The router is a pure pre-hook that optionally overrides the model ref before the agent starts. Disabled by default.
## Change Type (select all)
- [ ] Bug fix
- [x] Feature
- [ ] Refactor
- [x] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [x] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [ ] Memory / storage
- [ ] Integrations
- [x] API / contracts
- [x] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Closes #9402
- Related #4658, #10969, #15033
## User-visible / Behavior Changes
- New `router` config section (disabled by default — no behavior change unless explicitly enabled)
- When enabled, messages are classified and routed to tier-mapped models automatically
- Routed model shown as a badge in webchat UI (e.g. `🔀 anthropic/claude-haiku-4.5`)
- `/status` command shows router configuration when enabled
- New external file `~/.openclaw/router/ROUTER.md` created on first use for classification prompt tuning
- `router.apiKey` is redacted in config logs/exports via `.register(sensitive)`
## Security Impact (required)
- New permissions/capabilities? `No`
- Secrets/tokens handling changed? `Yes`
- New/changed network calls? `Yes`
- Command/tool execution surface changed? `No`
- Data access scope changed? `No`
- If any `Yes`, explain risk + mitigation:
- **router.apiKey**: New optional secret for the classifier API. Marked `.register(sensitive)` in Zod schema so it's redacted in
config dumps. Supports `env:VAR_NAME` syntax to avoid plaintext in config files.
- **Network calls**: Router makes HTTP POST to either local Ollama (`http://localhost:11434`) or a configured OpenAI-compatible endpoint. Only the user's message text + classification prompt are sent (~300 tokens). No session data, auth tokens, or PII beyond the message content. Calls are bounded by `timeoutMs` (default 10s).
## Repro + Verification
### Environment
- OS: Linux (Jetson / ARM64) + macOS (dev)
- Runtime/container: Docker (openclaw-gateway)
- Model/provider: Ollama (qwen3:4b) + OpenRouter (qwen/qwen3-4b)
- Integration/channel: Webchat
- Relevant config (redacted):
```json
{
"router": {
"enabled": true,
"provider": "openai-compatible",
"baseUrl": "https://openrouter.ai/api/v1",
"apiKey": "env:OPENROUTER_API_KEY",
"model": "qwen/qwen3-4b",
"tiers": {
"1": "openrouter/minimax/minimax-m2.1",
"2": "openrouter/anthropic/claude-haiku-4.5",
"3": "openrouter/anthropic/claude-opus-4.6"
},
"defaultTier": "1"
}
}
```
Steps
1. Enable router in config with tiers mapped to models
2. Send messages of varying complexity: "hey", "fix this TypeError", "design a microservices architecture"
3. Observe routed model badge in webchat and gateway logs
Expected
- "hey" → tier 1 (cheap model), badge shows tier 1 model
- "fix this TypeError" → tier 2 (code model)
- "design a microservices architecture" → tier 3 (premium model)
- Fallback to defaultTier on classifier error
Actual
- All tiers route correctly with ~200-700ms classifier latency
- Badge renders and persists on completed messages
- Fallback works on timeout/malformed response
Evidence
- Failing test/log before + passing after
- Trace/log snippets
- Screenshot/recording
- Perf numbers (if relevant)
Unit tests in src/hooks/pre-route.test.ts cover: config resolution, tier parsing (1/2/3), fallback on error/timeout, model ref
parsing, Ollama + OpenAI-compatible payloads, conversation context injection.
Human Verification (required)
- Verified scenarios: Tier 1/2/3 classification with live Ollama and OpenRouter endpoints; fallback on Ollama timeout; badge
rendering in webchat; badge persistence on completed messages; /status output; ROUTER.md hot-reload; heartbeat/session-reset bypass
- Edge cases checked: Classifier returns unexpected text (falls back); classifier returns tier not in config (falls back);
ROUTER.md missing (uses built-in default); apiKey with env: prefix resolves from environment
- What you did not verify: Concurrent routing under high load; interaction with agent-level model overrides; non-webchat channels (Telegram, WhatsApp, etc.)
Compatibility / Migration
- Backward compatible? Yes — router is disabled by default, no behavior change without opt-in
- Config/env changes? Yes — new optional router section in config
- Migration needed? No
Failure Recovery (if this breaks)
- How to disable/revert this change quickly: Set "router": { "enabled": false } in config, or remove the router section entirely. No restart needed for ROUTER.md changes; config change requires restart.
- Files/config to restore: Only the router config key. No database or state changes.
- Known bad symptoms reviewers should watch for: Elevated latency on every message (~200-700ms from classifier call); classifier endpoint returning errors flooding logs (mitigated: errors are caught and fall back silently)
Risks and Mitigations
- Risk: Classifier adds latency to every message when enabled
- Mitigation: Bounded by timeoutMs (default 10s, recommend 3-5s). Falls back to defaultTier on timeout. Latency is amortized into agent startup.
- Risk: User message content sent to classifier endpoint (privacy)
- Mitigation: Only the message text is sent, no metadata/auth. Local Ollama mode keeps data on-device. Documented so users can make informed choices.
- Risk: Router model naming inconsistency (bare API names vs OpenClaw refs)
- Mitigation: Documented as known limitation. Future work to support standard model refs with auto-credential resolution.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR adds intelligent pre-route model selection that classifies user messages using a lightweight LLM classifier before the main agent runs. Messages are classified into tiers (casual/code/complex) and routed to appropriate models to optimize cost and latency. The feature is disabled by default and well-encapsulated.
**Key changes:**
- New `src/hooks/pre-route.ts` with routing logic supporting Ollama and OpenAI-compatible APIs
- Configuration schema extended with optional `router` section (validated with refinement)
- Integration at agent execution entry point with context extraction from session history
- UI badge display showing routed model in webchat
- `/status` command shows router configuration when enabled
- Comprehensive test coverage for routing logic
**Previous concerns addressed:**
- `baseUrl` guard now returns early before calling `callOpenAICompatible` (line 293-301)
- Config schema uses optional fields + `.superRefine` to require `tiers`/`defaultTier` only when enabled
**Architecture:**
- Graceful fallback on all error paths (timeout, network, parse failures)
- Bypass for heartbeats and system prompts
- No changes to core agent pipeline or provider resolution
- Context truncation prevents token bloat (500 chars message, 200 chars context)
**Security:**
- `apiKey` marked `.register(sensitive)` for redaction
- Supports `env:VAR_NAME` syntax to avoid plaintext secrets
- Only user message text sent to classifier (no session metadata or PII beyond message content)
- Bounded by configurable timeout
<h3>Confidence Score: 4/5</h3>
- Safe to merge with minor considerations around classification accuracy and latency impact in production
- Well-architected feature with comprehensive error handling, graceful fallbacks, and good test coverage. Both previous thread concerns have been addressed. The implementation is properly scoped as a pre-hook with no changes to core agent logic. Feature is disabled by default, minimizing deployment risk. Deducting one point for: (1) classification accuracy is dependent on external LLM quality and prompt tuning, (2) adds 200-700ms latency to every message when enabled, and (3) limited verification on non-webchat channels per PR description
- No files require special attention - previous concerns in `src/hooks/pre-route.ts` and `src/config/zod-schema.ts` have been addressed
<sub>Last reviewed commit: 561d207</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#7770: feat(routing): Smart Router V2 - Configuration-driven model dispatc...
by zzjj7000 · 2026-02-03
79.4%
#8258: feat: Add smart model tiering for cost optimization
by revenuestack · 2026-02-03
76.2%
#9123: Feat/smart router backport and custom model provider
by JuliusYang3311 · 2026-02-04
76.1%
#16529: fix(fallback): treat OpenRouter routing errors as failover-eligible
by zirubak · 2026-02-14
75.6%
#15264: feat: Dynamic thinking level pre-routing based on message complexity
by phani-D · 2026-02-13
74.6%
#17392: Add testing infrastructure and expand gateway OAuth scopes
by jordanhubbard · 2026-02-15
71.9%
#17614: feat: allow before_agent_start hook to override model selection
by plc · 2026-02-16
71.4%
#20587: feat: add Tetrate Agent Router Service provider
by RicHincapie · 2026-02-19
70.5%
#14647: feat(plugins): allow before_agent_start hook to override model (#14...
by lailoo · 2026-02-12
70.1%
#7570: fix: allow models from providers with auth profiles configured
by DonSqualo · 2026-02-03
69.8%