#9456: feat(mac): add enhanced Siri neural voice support for Talk mode

by teknomage8 open 2026-02-05 07:26 View on GitHub →

channel: nextcloud-talk app: macos app: web-ui agents stale

Cluster: Text-to-Speech Provider Enhancements

## Summary - Add `talk.systemVoice` config option that routes Talk mode TTS through `/usr/bin/say` instead of `AVSpeechSynthesizer` - When set to `"siri"`, uses the system's Spoken Content default voice (enhanced Siri neural voice if downloaded in System Settings > Accessibility > Spoken Content) - When set to a specific voice name (e.g. `"Samantha"`), passes it as `-v` to the `say` command - Falls back to existing `AVSpeechSynthesizer` path when `systemVoice` is not configured ## Motivation The enhanced Siri neural voices available in macOS System Settings > Accessibility > Spoken Content are dramatically higher quality than the voices available through `AVSpeechSynthesizer`. These premium voices are accessible through the Carbon SpeechSynthesis framework (used by `/usr/bin/say`) but not through Apple's modern `AVSpeechSynthesizer` API. This provides a high-quality, zero-cost TTS option for Talk mode without requiring an ElevenLabs API key. ## Changes - **New file:** `TalkSayCommandSynthesizer.swift` — async wrapper around `/usr/bin/say` with cancellation support, token-based interrupt handling, and stderr capture - **Modified:** `TalkModeRuntime.swift` — routes to `TalkSayCommandSynthesizer` when `systemVoice` is configured; adds config parsing and diagnostic logging - **Modified:** `types.gateway.ts` — adds `systemVoice?: string` to `TalkConfig` - **Modified:** `zod-schema.ts` — adds `systemVoice` to Zod validation schema ## Configuration ```json { "talk": { "systemVoice": "siri" } } ``` ## Test plan - [x] Verified `/usr/bin/say` produces audio with enhanced Siri neural voice from terminal - [x] Verified Talk mode speaks with correct voice and realistic duration (189 chars = 16.7s, 223 chars = 20.2s, 376 chars = 27.5s) - [x] Verified `say` process exits cleanly (status 0, no stderr) - [x] Verified phase transitions: listening → thinking → speaking → listening - [x] Verified interrupt/stop cancels the `say` process - [ ] Verify Talk mode still works without `systemVoice` configured (AVSpeechSynthesizer path) --- *Contributed by [@teknomage8](https://github.com/teknomage8) — THE NOBLE HOUSE™ AI LAB*  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adds an optional `talk.systemVoice` configuration that routes Talk mode “system voice” playback through macOS’s `/usr/bin/say` (enabling enhanced Siri neural voices) while retaining the existing `AVSpeechSynthesizer` path when not configured. It also includes a series of unrelated changes across the repo: updated reasoning-tag stripping behavior (including a new “implicit thinking” mode for lone `</think>` tags), UI rendering tweaks for reasoning display, and Nextcloud Talk plugin fixes around HMAC signing and shutdown handling. <h3>Confidence Score: 3/5</h3> - This PR is not safe to merge as-is due to reasoning-tag parsing changes that can drop user-visible content on stray closing tags. - Core macOS Talk-mode change looks self-contained, but the updated “implicit thinking” logic in both shared text utilities and Pi embedded streaming will deterministically hide text when `</think>` appears without a prior open tag, which can happen from model glitches or literal content. These are behavior changes with user-visible impact outside the macOS feature scope. - src/shared/text/reasoning-tags.ts, src/agents/pi-embedded-utils.ts (and any other reasoning-tag consumers relying on previous behavior)  <sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub> **Context used:** - Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8)) - Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13))