#22797: Feat/auto thinking mode

by jrthib open 2026-02-21 17:41 View on GitHub →

size: L

Cluster: Agent Thinking Defaults Enhancement

## Summary Describe the problem and fix in 2–5 bullets: - Problem: OpenClaw did not have adaptive intent-based thinking selection; thinking level was manual/default-only. - Why it matters: Users want better reasoning depth automatically without constantly setting `/think`, but aggressive auto-selection can increase cost/latency if confidence is low. - What changed: Added adaptive thinking inference + integrated it into reply flow when no explicit/session think level is set; refined heuristics and made it confidence-based (returns `undefined` when unsure so existing defaults apply). - What did NOT change (scope boundary): No model/provider-specific hard gating added for auto mode; no config schema changes; no new commands/endpoints. ## Change Type (select all) - [ ] Bug fix - [x] Feature - [x] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #N/A - Related #N/A ## User-visible / Behavior Changes - When no explicit `/think` override is present, OpenClaw now infers a thinking level from message intent. - High-complexity prompts can map to `high`/`xhigh` (with existing `xhigh` compatibility guard retained). - Lightweight prompts can map to `low`. - Ambiguous prompts no longer force `medium`; they defer to existing model/session defaults. ## Security Impact (required) - New permissions/capabilities? (`Yes/No`) - **No** - Secrets/tokens handling changed? (`Yes/No`) - **No** - New/changed network calls? (`Yes/No`) - **No** - Command/tool execution surface changed? (`Yes/No`) - **No** - Data access scope changed? (`Yes/No`) - **No** - If any `Yes`, explain risk + mitigation: **N/A** ## Repro + Verification ### Environment - OS: Linux (containerized workspace) - Runtime/container: OpenClaw workspace runtime - Model/provider: Covered by unit/e2e logic paths (including xhigh-compat behavior) - Integration/channel (if any): Slack-threaded workflow context - Relevant config (redacted): default agent/session settings, no special config additions ### Steps 1. Send prompts with different intent types (architecture/spec/tradeoff, analysis/debug, quick/brief asks, ambiguous asks) with no explicit `/think`. 2. Observe selected `thinkLevel` passed into reply run path. 3. Verify ambiguous prompts fall back to default thinking resolver rather than forced `medium`. 4. Verify explicit/session think settings still take precedence. 5. Verify existing `/think xhigh` compatibility behavior still passes. ### Expected - Intentful prompts map to appropriate levels. - Ambiguous prompts defer to default resolver. - Existing precedence and xhigh guard remain intact. ### Actual - Matches expected in unit + targeted e2e verification. ## Evidence Attach at least one: - [x] Failing test/log before + passing after - [x] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) ## Human Verification (required) What you personally verified (not just CI), and how: - Verified scenarios: - Adaptive selector picks xhigh/high/medium/low for representative prompts. - Low-confidence prompts return `undefined` and fall back to default resolver. - Precedence preserved when think level already resolved. - Existing xhigh directive behavior still valid via targeted e2e. - Edge cases checked: - Greeting-prefixed substantive requests are not incorrectly downgraded to low. - Generic prompts like "Can you take a look at this?" do not force medium. - What you did **not** verify: - Full cross-provider live-model matrix under production traffic. ## Compatibility / Migration - Backward compatible? (`Yes/No`) - **Yes** - Config/env changes? (`Yes/No`) - **No** - Migration needed? (`Yes/No`) - **No** - If yes, exact upgrade steps: **N/A** ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: - Revert commits: - `12cf26c02` - `20ddbe98b` - Files/config to restore: - `src/auto-reply/thinking-auto.ts` - `src/auto-reply/reply/get-reply-run.ts` - `src/auto-reply/thinking-auto.test.ts` - `src/auto-reply/reply/get-reply-run.media-only.test.ts` - Known bad symptoms reviewers should watch for: - Unexpectedly high reasoning usage on generic prompts - Incorrect low selection on substantive prompts - Defaults no longer respected on ambiguous input ## Risks and Mitigations - Risk: Heuristic misclassification of intent. - Mitigation: Conservative confidence fallback (`undefined` -> existing defaults), plus expanded tests for ambiguous/greeting edge cases. - Risk: Regression in thinking-level precedence behavior. - Mitigation: Added run-path tests ensuring adaptive selection is skipped when think level is already resolved.  <h3>Greptile Summary</h3> Added adaptive thinking level selection based on user prompt intent. When no explicit `/think` level is set, the system now infers an appropriate thinking level (`xhigh`, `high`, `medium`, `low`) from pattern matching against the user's message text, with conservative fallback to existing defaults when confidence is low. <h3>Confidence Score: 4/5</h3> - This PR is safe to merge with low risk - The implementation is clean, well-tested, and follows conservative patterns with proper fallback behavior. The regex patterns are straightforward and the integration point correctly respects existing precedence rules. Minor concerns exist around pattern overlap and edge cases, but these are mitigated by the fallback-to-undefined approach. - No files require special attention <sub>Last reviewed commit: 12cf26c</sub>  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>