Home > Software engineering >  start_speech_synthesis_task To .mp3 (Amazon Polly)
start_speech_synthesis_task To .mp3 (Amazon Polly)

Time:08-30

I am trying to synthesize ~3600 characters using Amazon Polly via python. Currently I have the following:

from boto3 import client as polly_client

polly = polly_client('polly')
response = polly.start_speech_synthesis_task(Engine='neural',
                                            OutputS3KeyPrefix= "InputAudio",
                                            OutputS3BucketName='s3bucketname',
                                            Text= open('script.txt', 'r').read(),
                                            OutputFormat= 'mp3',
                                            VoiceId= 'Amy'
                                            )

How do I turn this response into an .mp3 file on my machine? I've been struggling to do this for a while now. Thank you

CodePudding user response:

You are correct – synthesize_speech() is limited to 3000 billed characters, while start_speech_synthesis_task() can do up to 100,000 billable characters.

The start_speech_synthesis_task() is asynchronous and outputs a file to an Amazon S3 bucket. From the documentation:

This operation requires all the standard information needed for speech synthesis, plus the name of an Amazon S3 bucket for the service to store the output of the synthesis task... The SpeechSynthesisTask object is available for 72 hours after starting the asynchronous synthesis task.

So, your program would need to:

  • Call start_speech_synthesis_task()
  • Create a loop that checks for the output every few seconds by calling get_speech_synthesis_task()
  • Downloads the mp3 file from the S3 bucket

See: Creating Long Audio Files - Amazon Polly

  • Related