Home > Software design >  How to programmatically manipulate voice in real time while dialing using Twilio?
How to programmatically manipulate voice in real time while dialing using Twilio?

Time:02-27

I have a small Twilio app that calls a real phone number (e.g. 3333333) whenever my Twilio number (e.g. 22222222) is called using my personal number (e.g. 1111111). I implement this using the following Twilio function:

exports.handler = (context, event, callback) => {
  const twiml = new Twilio.twiml.VoiceResponse();
  twiml.dial(" 3333333");
  return callback(null, twiml);
};

Now when the owner of 3333333 picks up his phone, a call connection is established between the caller ( 1111111) and the target ( 3333333).

How can I intercept speeches in this call, in real-time, by running a function whenever either the caller ( 1111111) or the target ( 3333333) speaks, to do things such as changing voice, filtering profanity, etc?

I have tried using <Gather> and <Say> TwiML verbs in my Twilio function but these will only get triggered after the call has ended or hung up.

CodePudding user response:

You can't. TwiML does not offer real-time access to the audio stream.

You might be able to do this by directing the call through a SIP trunk to a server that you control and can process the audio there.

CodePudding user response:

Twilio developer evangelist here.

You can actually achieve this with Twilio now. You can receive and send audio streams using the <Connect><Stream> TwiML. <Stream> allows you to receive and send audio to the call over a websocket connection in real time.

To change the audio in between, you would want to connect the callers just to the <Stream>, not to each other, and relay the audio from one call, through the websocket and whatever processing you want to do to it, and then out through the websocket connected to the other call (and vice versa).

I don't have more information on how to do that, as I've not seen it done. But it's possible in theory.

  • Related