#16399: feat: auto-escalate thinking level based on context window usage
commands
stale
size: L
Cluster:
Agent Thinking Defaults Enhancement
## Summary
This PR implements automatic thinking level escalation based on context window usage. As conversations grow and approach the context window limit, models with low thinking levels can become prone to confident hallucinations. This feature automatically increases the thinking level when the context window fills up, helping maintain response quality in long sessions.
## Problem
When using low thinking levels (e.g., "off", "minimal", "low") with models that have growing context:
- The model becomes increasingly prone to confident hallucinations as context fills
- Users may not realize they need to manually increase thinking levels for long conversations
- Session quality degrades over time without clear feedback
## Solution
Added opt-in `thinkingEscalation` configuration that automatically escalates thinking level when context window usage reaches configured thresholds.
### Configuration Example
```yaml
agents:
defaults:
thinkingEscalation:
enabled: true
thresholds:
- atContextPercent: 50
thinking: low
- atContextPercent: 75
thinking: medium
- atContextPercent: 90
thinking: high
```
### Key Behaviors
- **Opt-in**: Disabled by default - users must explicitly enable
- **Only escalates**: Never downgrades thinking level within a session
- **Threshold-based**: Configure multiple thresholds at different context percentages
- **Highest applicable**: When multiple thresholds are met, uses the highest thinking level
- **Persistent**: Updates are persisted to session store
### Changes
1. **Types** (`src/config/types.agent-defaults.ts`): Already had `AgentThinkingEscalationConfig` and `AgentThinkingEscalationThreshold` types
2. **Zod Schema** (`src/config/zod-schema.agent-defaults.ts`): Already had validation for `thinkingEscalation`
3. **Escalation Logic** (`src/auto-reply/reply/thinking-escalation.ts`): New module implementing escalation evaluation
4. **Integration** (`src/auto-reply/reply/agent-runner.ts`): Integrated escalation check after context window evaluation
5. **Tests** (`src/auto-reply/reply/thinking-escalation.test.ts`): Comprehensive test coverage
## Testing
All new code is covered by tests:
- Disabled escalation scenarios
- Missing data handling
- Escalation at various thresholds
- No-downgrade guarantee
- Multiple threshold selection
- Session persistence
```
npx vitest run src/auto-reply/reply/thinking-escalation.test.ts
✓ 10 tests passed
```
## Checklist
- [x] TypeScript compiles without errors
- [x] Tests pass
- [x] Opt-in (disabled by default)
- [x] Only escalates, never downgrades
- [x] Minimal, focused diff
- [x] No dist changes
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR adds opt-in automatic thinking level escalation based on context window usage, applied in two separate code paths: the auto-reply/messaging path (`thinking-escalation.ts` called from `agent-runner.ts`) and the CLI agent path (`session-store.ts`). The `session-store.ts` implementation includes provider/model validation via `listThinkingLevels()`, while the `thinking-escalation.ts` implementation does not (this was flagged in a prior thread). Types, Zod schema, and test coverage are included.
- The core logic is well-structured: threshold-based, only-escalate semantics, clamped percentages, and best-effort persistence.
- `THINKING_LEVEL_ORDER` is duplicated across three files — extracting it to a shared module would reduce drift risk.
- `session-store.test.ts` re-implements `computeTargetThinkingLevel` locally instead of testing the actual function, which could mask regressions if the real implementation changes.
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge — the feature is opt-in, well-guarded, and non-breaking.
- The escalation logic is correct and well-tested. The feature is opt-in (disabled by default), only escalates (never downgrades), and handles edge cases properly. Minor concerns: duplicated THINKING_LEVEL_ORDER constant across files, and test file re-implements rather than testing the actual function. The missing provider/model validation in thinking-escalation.ts was already flagged in a prior thread.
- `src/commands/agent/session-store.test.ts` re-implements the function under test locally rather than importing it, which could mask regressions.
<sub>Last reviewed commit: 4eb1db7</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#22797: Feat/auto thinking mode
by jrthib · 2026-02-21
80.4%
#16899: feat(config): per-agent and per-model thinking defaults
by jh280722 · 2026-02-15
79.3%
#21558: config: support agents.list[].thinkingDefault
by Uarmagan · 2026-02-20
78.3%
#15030: Agents: support per-agent thinking defaults
by sauerdaniel · 2026-02-12
77.4%
#10998: fix(agents): pass session thinking/reasoning levels to session_stat...
by wony2 · 2026-02-07
76.8%
#15606: LLM Task: add explicit thinking level wiring
by xadenryan · 2026-02-13
75.8%
#15264: feat: Dynamic thinking level pre-routing based on message complexity
by phani-D · 2026-02-13
75.0%
#16298: feat(xai): switch grok-4-1-fast variants by thinking level
by avirweb · 2026-02-14
74.4%
#21614: fix: warn when thinking level xhigh falls back for unsupported models
by lbo728 · 2026-02-20
73.8%
#18695: feat(agents): add per-agent thinkingDefault override
by cathrynlavery · 2026-02-17
73.7%