#12597: voice-call: add Asterisk ARI provider + core STT
channel: voice-call
stale
Cluster:
Voice Transcription Enhancements
AI-assisted PR.
## Problem
We need a stable way to handle voice calls across multiple telephony worlds (SIP endpoints, SIP trunks, GSM gateways) while keeping **one consistent OpenClaw integration**. Asterisk is the natural telecom router, but **ARI (Stasis) has no built‑in STT**. That means call control alone isn’t enough if we want `call.speech` events and the same transcription semantics as other providers. RTP media can also be fragile (codec/PT/NAT), and without a deterministic setup you end up with “rings but silence.”
## Solution
Add/refresh the **`asterisk-ari`** provider and split responsibilities cleanly:
- **Asterisk ARI (Stasis)** for call control (channels, bridges, events, DTMF)
- **ExternalMedia / UnicastRTP** for deterministic audio bridging
- **OpenClaw core transcription** for STT (with auto‑fallback across configured engines)
Key decision: **STT lives in core**, not inside the provider. This keeps behavior consistent across providers (events, VAD, fallback), regardless of the audio source.
## Functionality
**1) Call handling (outbound/inbound)**
- Outbound: originate → Stasis → mixing bridge → ExternalMedia → RTP → TTS playback
- Inbound: Stasis entry → bridge → ExternalMedia → RTP → core STT → `call.speech`
**2) Audio + codecs**
- Uses `asteriskAri.codec` (no `format` field in schemas)
- RTP payload type matches codec (PCMU=0 / PCMA=8)
- μ‑law ↔ A‑law conversion supported
- **Per‑call RTP sockets/ports** + deterministic media setup
**3) STT via OpenClaw core**
- In‑memory buffering → WAV → core transcription
- VAD with dynamic noise floor + hangover + pre‑roll + backpressure
- Emits standard `call.speech` events
**4) DTMF**
- `ChannelDtmfReceived` → `call.dtmf`
**5) Cleanup + resilience**
- Best‑effort ExternalMedia cleanup (idempotent)
- Inbound reject: hangup by SIP channel id (best‑effort; channel may already be gone)
## Testing
**Unit tests:**
- npx -y vitest run extensions/voice-call/src/providers/asterisk-ari.test.ts
- npx -y vitest run extensions/voice-call/src/providers/asterisk-ari/ari-client.test.ts
- npx -y vitest run extensions/voice-call/src/providers/asterisk-ari/ari-media.test.ts
**Manual checklist (summary):**
- Asterisk config + Stasis app name matches `asteriskAri.app`
- Outbound: call → TTS audible → `call.speech`
- Inbound: Stasis route → greeting → `call.speech`
- DTMF: digits → `call.dtmf`
- STT/VAD: silence vs short utterances vs noise
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR adds an `asterisk-ari` voice-call provider that uses Asterisk ARI (Stasis) for call control plus deterministic RTP media bridging (ExternalMedia / UnicastRTP). It also introduces core audio transcription wiring via `src/media-understanding/transcribe.ts` so providers can feed buffers into the existing media-understanding runner and get standard `call.speech` events with the same fallback/decision semantics as other sources.
On the voice-call side, config/schema is extended to include `asteriskAri` (baseUrl/credentials/app/rtpHost/rtpPort/codec/trunk), the runtime can instantiate and shut down the ARI provider (including websocket cleanup), and the CallManager gains an explicit `ensureInboundCall()` path to avoid inbound call record races and to support early inbound rejection by providerCallId.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk.
- Reviewed the changes around voice-call config resolution/validation, CallManager inbound creation/rejection logic, provider shutdown hooks, and the new core transcription entrypoint; the previously reported race/leak/idempotency issues appear addressed in this head SHA, and no new deterministic runtime/type failures were found.
- No files require special attention
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#19073: feat(voice-call): streaming TTS, barge-in, silence filler, hangup, ...
by odrobnik · 2026-02-17
76.1%
#4325: fix(voice-call): verify call status with provider before loading st...
by garnetlyx · 2026-01-30
74.0%
#14208: feat(media): add AssemblyAI audio transcription provider
by jmoraispk · 2026-02-11
73.5%
#5499: fix(voice-call): wait for session creation before sending config up...
by lailoo · 2026-01-31
73.3%
#10447: feat(voice-call): add Deepgram STT provider
by chrharri · 2026-02-06
73.1%
#7965: feat(tts): add Speechify as TTS provider
by chaerla · 2026-02-03
72.8%
#7652: fix(voice-call): fix Telnyx transcription (STT) not working
by tturnerdev · 2026-02-03
72.2%
#14086: feat(tts,media): add base Sarvam TTS and STT providers
by kiranjd · 2026-02-11
71.2%
#23572: feat(voice): enable voice note conversation loop for Telegram and W...
by davidrudduck · 2026-02-22
70.6%
#18852: fix: Voice-call state persistence is fire-and-forget, causing silen...
by coygeek · 2026-02-17
70.6%