#22477: fix(voice-call): hardcode thinkLevel to off for latency-sensitive voice responses (#22423)

by lailoo open 2026-02-21 06:42 View on GitHub →

channel: voice-call size: S experienced-contributor

Cluster: Elevated Default Configuration Fixes

## Summary - **Bug**: Voice call responses use the global `thinkingDefault` instead of always disabling extended thinking - **Root cause**: `generateVoiceResponse` in `response-generator.ts` called `resolveThinkingDefault()` which returns the user's global config default (e.g. `medium`) - **Fix**: Hardcode `thinkLevel = "off"` for voice calls since they are latency-sensitive Fixes #22423 ## Problem Voice calls are latency-sensitive — users expect near-instant spoken responses. However, `generateVoiceResponse()` was calling `deps.resolveThinkingDefault({ cfg, provider, model })` to determine the thinking level, which returns whatever the user configured globally (e.g. `"medium"`). This caused the LLM to spend time on extended thinking during voice calls, adding unnecessary latency. **Before fix:** ``` thinkLevel = deps.resolveThinkingDefault({ cfg, provider, model }) // Returns "medium" (or whatever global default is) → slow voice responses ``` ## Changes - `extensions/voice-call/src/response-generator.ts` — Replace `resolveThinkingDefault()` call with hardcoded `"off" as const` - `extensions/voice-call/src/response-generator.test.ts` — Add regression test verifying thinkLevel is always `"off"` - `CHANGELOG.md` — Add fix entry **After fix:** ``` thinkLevel = "off" as const // Always "off" for voice calls → fast responses ``` ## Test plan - [x] New test: verifies `thinkLevel` passed to `runEmbeddedPiAgent` is `"off"` even when global default is `"medium"` - [x] Test passes on fix branch - [x] Test fails on main (confirmed bug: `expected 'medium' to be 'off'`) - [x] Format check passes - [x] Lint passes ## Effect on User Experience **Before:** Voice call responses could be slow because the LLM spent time on extended thinking (e.g. `medium` level), adding latency to an inherently real-time interaction. **After:** Voice calls always use `thinkLevel: "off"`, ensuring the fastest possible response time for phone conversations.  <h3>Greptile Summary</h3> Hardcoded `thinkLevel` to `"off"` in voice call responses to eliminate latency from extended thinking. The fix replaces a dynamic call to `resolveThinkingDefault()` (which returned user's global config like `"medium"`) with a hardcoded constant, ensuring voice calls always use the fastest response mode. Added regression test confirming `thinkLevel` is always `"off"` even when global default is `"medium"`. <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk. - The fix is a single-line change with clear intent, well-documented rationale, and comprehensive test coverage. The change correctly addresses the latency issue by hardcoding `thinkLevel` to `"off"` for voice calls. The test validates the fix and would catch any regression. No syntax errors, no security concerns, and the change aligns with the stated goal of optimizing voice call latency. - No files require special attention. <sub>Last reviewed commit: 7a53a5e</sub>