#18077: fix: deduplicate TTS audio delivered via tool results

by stakeswky open 2026-02-16 13:17 View on GitHub →

agents stale size: S

Cluster: Voice Call and TTS Improvements

## Problem When the TTS tool returns a `MEDIA:` path, the audio is delivered immediately from the tool result. However, the model often echoes the `MEDIA:` line in its follow-up assistant message (encouraged by the old "Copy the MEDIA line exactly" instruction), causing the same audio file to be sent to the user twice. Closes #17991 ## Root Cause Two issues: 1. **TTS tool description** explicitly instructs the model to "Copy the MEDIA line exactly" — directly causing the duplicate 2. **`filterMessagingToolDuplicates`** only deduplicates text content, not media paths — so even if the model echoes the MEDIA path, there's no safety net ## Fix Two-layer approach: ### Layer 1: Prevent (tool description) Updated TTS tool description to tell the model the audio is delivered automatically and NOT to repeat the MEDIA line. ### Layer 2: Catch (media path dedup) Added `toolResultMediaPaths` tracking through the full pipeline: - `pi-embedded-subscribe` state tracks media paths extracted from tool results - Paths flow through: subscribe → attempt result → run result → dedup callsites - `filterMessagingToolDuplicates` now accepts optional `sentMediaPaths` and drops payloads whose only content is a duplicate media path - Payloads with duplicate media but meaningful text are kept (only media stripped conceptually) ### Files Changed - `tts-tool.ts` — updated description - `pi-embedded-subscribe.handlers.tools.ts` — track delivered media paths in both `emitToolOutput` and direct `onToolResult` paths - `pi-embedded-subscribe.handlers.types.ts` — added `toolResultMediaPaths` to state type - `pi-embedded-subscribe.ts` — init + reset + expose getter - `run/attempt.ts` — destructure + pass through - `run/types.ts` — added to attempt result type - `run.ts` — pass through to run result (both code paths) - `pi-embedded-runner/types.ts` — added to run result type - `agent-runner.ts` — pass to `buildReplyPayloads` - `agent-runner-payloads.ts` — accept + forward to dedup - `followup-runner.ts` — forward to dedup - `reply-payloads.ts` — enhanced `filterMessagingToolDuplicates` - `reply-payloads.media-dedup.test.ts` — 7 new tests ## Testing - 7 new unit tests covering media path dedup (case insensitivity, text+media combos, backward compat, whitespace-only text) - All 389 existing tests pass with zero regressions  <h3>Greptile Summary</h3> This PR fixes duplicate TTS audio delivery with a well-structured two-layer approach: (1) updating the TTS tool description to stop telling the model to echo `MEDIA:` lines, and (2) adding a dedup safety net that tracks media paths delivered via tool results and filters them out of follow-up assistant messages. - **Layer 1 (Prevention):** The TTS tool description in `tts-tool.ts` is updated to tell the model the audio is delivered automatically, removing the "Copy the MEDIA line exactly" instruction that was directly causing duplicates. - **Layer 2 (Safety net):** `toolResultMediaPaths` tracking is plumbed through the full pipeline: subscribe state → attempt result → run result → `buildReplyPayloads` → `filterMessagingToolDuplicates`. The dedup function now accepts optional `sentMediaPaths` and drops payloads whose only content is a duplicate media path (case-insensitive matching). Payloads with meaningful text alongside duplicate media are preserved. - **Testing:** 7 new unit tests cover the key dedup scenarios. The fixture file is updated for type compliance. - **Minor gap:** The dedup only checks `mediaUrl` (singular), not `mediaUrls` (plural). Since `mediaUrl` is always set to the first element when `mediaUrls` is present, this covers the primary TTS use case but wouldn't catch secondary media paths in multi-media payloads. <h3>Confidence Score: 4/5</h3> - This PR is safe to merge with minimal risk — changes are additive and backward-compatible, with good test coverage. - Score of 4 reflects: clean two-layer approach with both prevention and safety net, consistent plumbing through a complex pipeline, 7 new tests with zero regressions, and backward compatibility (sentMediaPaths is optional with default []). Docked one point for the minor mediaUrls (plural) dedup gap, though it doesn't affect the primary TTS use case. - `src/auto-reply/reply/reply-payloads.ts` — the core dedup logic only checks `mediaUrl` (singular), not `mediaUrls` (plural), which could matter for future multi-media tool results. <sub>Last reviewed commit: 3bc21bb</sub>