#21110: fix(tts): deliver audio via structured mediaUrl instead of MEDIA: text tokens
agents
size: S
Cluster:
Voice Call and TTS Improvements
## Problem
The built-in `tts` tool generates audio to absolute paths (`/tmp/tts-xxx/voice-xxx.opus`) and returns them as `MEDIA:/tmp/...` text tokens. The media parser security policy in `splitMediaFromOutput` blocks absolute paths, so users see raw file paths as text instead of receiving voice messages.
The `/tts audio` slash command works fine because it sets `mediaUrl` directly on the reply payload, bypassing text-based parsing entirely.
Fixes #14174.
## Solution
Option A from the issue: deliver audio through structured tool result fields instead of `MEDIA:` text tokens.
### Changes
**Core fix — `tts-tool.ts`:**
- Replaced `MEDIA:${audioPath}` text token with `details.mediaUrl` and `details.audioAsVoice` fields
- Content text returns `SILENT_REPLY_TOKEN` instead of a `MEDIA:` token that gets blocked
**Media extraction — `pi-embedded-subscribe.tools.ts`:**
- Added strategy 0 (highest priority): check `details.mediaUrl` / `details.mediaUrls` before falling back to text-based `MEDIA:` parsing
- Added `detailsOnly` option to skip text-based extraction when `emitToolOutput` already handles it (prevents duplicates)
- New `extractToolResultAudioAsVoice()` helper
**Handler — `pi-embedded-subscribe.handlers.tools.ts`:**
- Media delivery runs regardless of `shouldEmitToolOutput()`, using `detailsOnly: true` when emit is on
- Extracts and passes `audioAsVoice` through the callback
**Type + propagation:**
- `audioAsVoice?: boolean` added to `onToolResult` callback type
- Forwarded through `pi-embedded-subscribe.ts` and `agent-runner-execution.ts`
## Why not just allow /tmp/ in the security policy?
Punching holes in the path security policy for specific directories would weaken the sandbox model. The structured approach is cleaner: trusted built-in tools deliver media through typed fields, untrusted LLM text output stays sandboxed.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR fixes TTS audio delivery by replacing text-based `MEDIA:` tokens with structured `details.mediaUrl` fields, bypassing security policies that block absolute paths. The changes successfully implement Option A from issue #14174.
**Key changes:**
- `tts-tool.ts`: Returns `details.mediaUrl` and `details.audioAsVoice` instead of `MEDIA:${audioPath}` text tokens
- Media extraction: Added strategy 0 (highest priority) to check `details.mediaUrl`/`details.mediaUrls` before text parsing
- `detailsOnly` option prevents duplicate extraction when `emitToolOutput` already handles text-based `MEDIA:` parsing
- `audioAsVoice` flag propagated through callback chain to preserve voice-bubble metadata
**Note:** This PR also includes an unrelated security commit (31b12562) that strips hidden content from `web_fetch` to prevent prompt injection attacks (#8027). This security fix adds comprehensive HTML sanitization and invisible Unicode stripping.
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge with minor considerations about scope
- The TTS fix is well-structured and follows a clean pattern of delivering media through typed fields instead of text tokens. The implementation correctly propagates `audioAsVoice` through the callback chain and uses the `detailsOnly` option to avoid duplicate media extraction. The web-fetch security fix is comprehensive with thorough test coverage. However, the PR combines two unrelated features (TTS fix + web-fetch security), which slightly reduces confidence as they should ideally be separate PRs.
- No files require special attention - the implementation is clean and follows existing patterns
<sub>Last reviewed commit: 663f98e</sub>
<!-- greptile_other_comments_section -->
<sub>(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#21513: Agents: track TTS media in duplicate filter state
by DevvGwardo · 2026-02-20
86.8%
#19439: fix(tts): pass audioAsVoice flag through tool result pipeline
by brandonwise · 2026-02-17
85.0%
#20992: fix(tts): apply TTS processing to agentCommand outbound delivery path
by mmyyfirstb · 2026-02-19
82.4%
#7400: media: allow temp-dir MEDIA paths for tool outputs
by grammakov · 2026-02-02
82.2%
#18077: fix: deduplicate TTS audio delivered via tool results
by stakeswky · 2026-02-16
82.1%
#19399: telegram: fix MEDIA false positives and partial final drop
by HOYALIM · 2026-02-17
81.6%
#14794: fix: parse inline MEDIA: tokens in agent replies
by explainanalyze · 2026-02-12
81.1%
#19868: fix: prevent media token regex from matching markdown bold text
by sanketgautam · 2026-02-18
79.9%
#18890: fix(media): parse tool-result MEDIA directives with shared parser
by teededung · 2026-02-17
79.6%
#21193: fix(tts): send voice messages as Opus bubbles on Telegram
by aris-katkova · 2026-02-19
79.4%