Home > Mobile >  Can gTTS speak a list(PYTHON)
Can gTTS speak a list(PYTHON)

Time:10-17

Is it possible to speak a list. Right now I am using

#Minimum reproducable
import tkinter as tk
from gtts import gTTS
from io import BytesIO
import pygame

def play():
    words = [one,boy,girl,man,woman,two]
    for i in words:
        speak(i)

def speak(text,language="en",accent="com"):
    mp3_fp = BytesIO()
    phrase = gTTS(text=text,lang=language,tld=accent)
    phrase.write_to_fp(mp3_fp)
    pygame.init()
    pygame.mixer.init()
    pygame.mixer.music.load(mp3_fp,"mp3")
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy():
        
        pygame.time.delay(10)
        pygame.event.poll()

play()

This code works but is not the best. If your try to pause the audio only one word out of the list gets paused and the rest becomes played. Is there a way to speak the list, be able to pause it, and play it again with no error. Fairley new to this. I am using modules so I don't have to save the mp3s. They are saved to a variable and are then played. This involves not extra files to be created. Also when I am using the speak() function I must use threading to be able to interact with the tkinter window while the audio from Pygame Mixer is being played.

Goal: To be able to pause the list and replay

CodePudding user response:

Thanks for inviting me to answer this question, and sorry that it took me so long to notice.

Two minor admonishments:

  1. That list still isn't correct. words = [one,boy,girl,man,woman,two] is a list of variables (which have not been defined), not text. The line has to be words = ["one", "boy", "girl", "man", "woman", "two"].
  2. You've done the vast majority of the work, but it isn't really the minimum amount of code because it doesn't demonstrate your pausing problem.

Having said that, I couldn't really reproduce your problem. What I did find is that there does seem to be a problem with pygame.time.delay() which seems to freeze unpredictably; at least it does on my machines (Python3 on Linux).

To solve that problem I changed pygame.time.delay() to pygame.time.wait().

The following code demonstrates that each word pauses in the middle, and the next words don't start until after the previous words have finished. It begins playing the word, then pauses and unpauses repeatedly until the word is complete. I've added the variable delay so that you can experiment with different delay lengths. 10ms didn't work well, but the delay is very pronounced at 100ms.

I also changed pygame.event.poll(), which only gets one event and doesn't do anything with it, to pygame.event.clear(), which is, I think, what the goal is of that line to begin with: to keep the event queue empty.

You mentioned that you interacted with it using TkInter and threads. If you're still having trouble, perhaps ask again including your TkInter and threading code.

# import tkinter as tk
from gtts import gTTS
from io import BytesIO
import pygame

def play():
    words = ["one", "boy", "girl", "man", "woman", "two"]
    for i in words:
        speak(i)

def speak(text,language="en",accent="com"):
    mp3_fp = BytesIO()
    phrase = gTTS(text=text,lang=language,tld=accent)
    phrase.write_to_fp(mp3_fp)
    pygame.init()
    pygame.mixer.init()
    pygame.mixer.music.load(mp3_fp,"mp3")
    pygame.mixer.music.play()
    
    delay = 100
    while pygame.mixer.music.get_busy():
        pygame.time.wait(delay)
        pygame.mixer.music.pause()
        pygame.time.wait(delay)
        pygame.mixer.music.unpause()
        pygame.event.clear()

play()

During my own personal testing of this I couldn't resist using this as my test text. You may not recognize it. You'll want to comment out the delay/pause/unpause section to listen to it, and it takes several seconds to load the buffer before it can play.

speak("Good morning, and welcome to the Black Mesa transit system.  This automated train is provided for the security and convenience of the Black Mesa Research Facility personnel.  The time is 8:47 A M.  Current topside temperature is 93 degrees with an estimated high of 105.  The Black Mesa compound is maintained at a pleasant 68 degrees at all times.  This train is inbound from level 3 dormitories to sector C test labs and control facilities.  If your intended destination is a high security area beyond sector C, you will need to return to the central transit hub in area 9 and board a high security train.  If you have not yet submitted your identity to the retinal clearance system, you must report to Black Mesa personnel for processing before you will be permitted into the high security branch of the transit system.  Due to the high toxicity of material routinely handled in the Black Mesa compound, no smoking, eating, or drinking are permitted within the Black Mesa transit system.  Please keep your limbs inside the train at all times.  Do not attempt to open the doors until the train has come to a complete halt at the station platform.  In the event of an emergency, passengers are to remain seated and await further instruction.  If it is necessary to exit the train, disabled personnel should be evacuated first. Please, stay away from electrified rails and proceed to an emergency station until assistance arrives.")

CodePudding user response:

So I'm adding a second answer! Never thought I'd be doing that. For others reading this: This second answer is in response to the OP's comment to my first answer, based on what little I know of his implementation.

Complete code at the end.

You mentioned that you're using TkInter, so the first thing I did was to implement the TkInter code. I usually write the graphical interface first because it helps me to imagine what methods I'll need in the rest of the program. The GUI is an event-driven interface and shouldn't implement any core working code itself.

import tkinter as tk

#create a TkInter class to hold our 'play' and 'pause' buttons
#the canonical way to do this is to subclass it from a tkinter.Frame() object
class PlayWindow(tk.Frame):
    '''This is the play window to test the pause function.'''
    def __init__(self, master, talker):
        '''Initialize the PlayWindow'''
        super(PlayWindow, self).__init__(master)    #init the super Frame
        self.master = master    #sometimes we need this
        self.talker = talker    #the talker object
        
        #creates and packs the "Gordon Freeman" button
        tk.Button(master, text="Gordon Freeman", command=self.dispatchGordonFreeman).pack()
        # tk.Button(master, text="Words", command=self.dispatchWords).pack()
        
        #creates and packs the stop, pause, and play/resume buttons
        tk.Button(master, text=u'\u25A0', command=self.stop).pack(side=tk.LEFT, expand=True, fill=tk.BOTH)
        tk.Button(master, text=u'\u275A\u275A', command=self.pause).pack(side=tk.LEFT, expand=True, fill=tk.BOTH)
        tk.Button(master, text=u'\u25B6', command=self.resume).pack(side=tk.LEFT, expand=True, fill=tk.BOTH)

    def dispatchWords(self):
        '''Dispatches the 'words()' method as a thread to free up the TkInter interface.'''
        x = threading.Thread(target=self.GCIreland, daemon=True)
        x.start()
    
    def GCIreland(self):
        '''Creates a list of words then sends them one at a time to the TTS talker.'''
        words = ["one", "boy", "girl", "man", "woman", "two"]
        for i in words:
            talker.say(i)

    def dispatchGordonFreeman(self):
        '''Dispatches the 'gordonfreeman()' method as a thread to free up the TkInter interface.'''
        x = threading.Thread(target=self.gordonFreeman, daemon=True)
        x.start()
    
    def gordonFreeman(self):
        '''Creates a list with a well-known speech and hands it off to the Text-To-Speech talker.'''
        speech = []

        speech.append("Good morning, and welcome to the Black Mesa transit system.")
        speech.append("This automated train is provided for the security and convenience of the Black Mesa Research Facility personnel.")
        speech.append("The time is 8:47 A.M.")
        speech.append("Current topside temperature is 93 degrees with an estimated high of 105.")
        speech.append("The Black Mesa compound is maintained at a pleasant 68 degrees at all times.")
        speech.append("This train is inbound from level 3 dormitories to sector C test labs and control facilities.")
        speech.append("If your intended destination is a high security area beyond sector C, you will need to return to the central transit hub in area 9 and board a high security train.")
        speech.append("If you have not yet submitted your identity to the retinal clearance system, you must report to Black Mesa personnel for processing before you will be permitted into the high security branch of the transit system.")

        speech.append("Due to the high toxicity of material routinely handled in the Black Mesa compound, no smoking, eating, or drinking are permitted within the Black Mesa transit system.")
        speech.append("Please keep your limbs inside the train at all times.")
        speech.append("Do not attempt to open the doors until the train has come to a complete halt at the station platform.")
        speech.append("In the event of an emergency, passengers are to remain seated and await further instruction.")
        speech.append("If it is necessary to exit the train, disabled personnel should be evacuated first. Please, stay away from electrified rails and proceed to an emergency station until assistance arrives.")
        
        for i in speech:
            talker.say(i)
            
    def stop(self):
        '''Completely stops playback.  (Not implemented.)'''
        pass
    
    def resume(self):
        '''Hopefully this resumes playback from whence it paused.'''
        self.talker.resume()
    
    def pause(self):
        '''With any luck at all this will pause playback in a way that it can be resumed.  I dunno -- your guess is as good as mine!'''
        self.talker.pause()

It seems obvious that it's important to you to be able to feed a list of text to the "talker", so I've implemented the program to fulfill that apparent requirement. I discovered, while I was writing this program, that it is somewhat difficult to pause the playback of a single-syllable word, so what I did was to include a second set of methods to implement the "Gordon Freeman" speech instead of the list of words. The button is there to speak the list of words and all you have to do is to uncomment it.

Just a few comments on this part of the code:

Usually what you do is to create the TkInter root window in the main part of your program, then modularly create the event-driven functionality of the interface in separate classes. Each such class is usually subclassed from the tkinter.Frame() object. Because we subclass .Frame() we need to call .Frame()'s __init__() method. The best way of doing that, IMO, is to use super().

I've dispatched the button code as threads just to free up the button. This is really just for appearances so the button doesn't remain depressed for the entire time that _io.Bytes() is converting the stream.

You'll see in the button code that I've done two things: I've use unicode in the button text so as to make it look like real media playback buttons, and I've not kept the button object IDs because I have no intention of manipulating those buttons in any way after they've been created. Usually you'll want to catch those IDs so that you can properly pack them, disable them when they're not needed, or implement them in other ways. I didn't need to do that in this case.

The next class I implemented was the talker. This initializes PyGame and the mixer and holds all the methods to begin speaking, pause speaking, and resume.

import pygame
import threading
from io import BytesIO
from gtts import gTTS

#create a custom TextToSpeechTalker class to convert text to speech and play it
#   through PyGame's mixer.  Really, the text-to-speech conversion should not
#   really be done in this class as it is unrelated to PyGame.
class TextToSpeechTalker():
    def __init__(self, *, language="en", accent="com"):
        '''Initializes the Google gTTS Text-To-Speech module.'''
        self.language = language    #desired language
        self.accent = accent        #desired locaization accent
        self.phraseList = []        #a list to hold a bunch of text
        self.threadRunning = False  #boolean flag True when thread is running
        self.paused = False         #flag that knows if we have paused

        pygame.mixer.init()         #initialize pygame's sound mixer

    def say(self, text):
        '''Converts text to speech, puts it in a buffer, then dispatches the talker
        as a thread.'''
        self.phraseList.append(text)    #append the text to the list
        if not self.threadRunning:
            self.dispatchTalkerThread() #dispatch a thread to "say" it
        
    def dispatchTalkerThread(self):
        '''Handles dispatching the talker method as a thread.'''
        #don't actually need to dispatch it if it's already running
        if not self.threadRunning:
            self.threadTalker = threading.Thread(target=self.talk, daemon=True)
            self.threadRunning = True
            self.threadTalker.start()
            self.threadTalker.join()    #wait for the thread to terminate
            self.threadRunning = False
        
    def talk(self):
        '''This plays every entry in our list of text.  It is dispatched as a thread.'''
        #this while-loop loops through all entries in the phraseList
        while self.phraseList:      #boolean way of checking that there is something in the list
            mp3_fp = BytesIO()          #stream buffer to hold mp3 data
            phrase = gTTS(text=self.phraseList.pop(0), lang=self.language, tld=self.accent) #creates the TTS object
            phrase.write_to_fp(mp3_fp)  #write binary data to the stream buffer
            pygame.mixer.music.load(mp3_fp,"mp3")   #load the stream into mixer
            pygame.mixer.music.play()               #start playing the mp3
            
            #this is here to prevent multiple mp3s being played simultaneously
            #it won't start playing the next buffer until the current one is complete
            #   it just waits for the mixer to become not busy
            while True:
                pygame.time.wait(100)   #arbitrary delay (1/10th second to be nice)
                pygame.event.clear()    #keep the event buffer clear for now
                if not self.paused and not pygame.mixer.music.get_busy():
                    break
    
    def pause(self):
        '''Pauses the mixer output.'''
        pygame.mixer.music.pause()      #pause playback
        self.paused = True              #make sure we know we're paused
    
    def resume(self):
        '''Resumes mixer output.'''
        pygame.mixer.music.unpause()
        self.paused = False

The talker handles all of the I/O and methods needed for controlling playback. You give the talker some text, it puts that into a list, then it converts each bit of text in the list as it needs it. I tried pre-converting the text to a list of _io.Bytes() buffers but was having some trouble with that and it was just quicker for me to save the text in a list instead of saving a bunch of binary buffers.

The code which does all the "talking" through the mixer has been dispatched as a thread. I did this primarily because my gut was telling me that it was the just and right way to implement it. This way the I didn't have to figure out a nice way of checking for new text to convert, and the talking thread is only running when there is something to output. It also turned out that it made implementing the pause() and resume methods a little easier.

Because it's a class you can rewrite this using some other text-to-speech module and all you need to make sure you do is implement .say(), .stop(), .pause(), and .resume() methods.

Lastly, the main() method. Except I didn't actually implement a main().

if __name__ == "__main__":
    #We're going to start by initializing PyGame
    #   this should be done here and not in the talker object
    pygame.init()

    #Now initializing a talker object:
    talker = TextToSpeechTalker(language='en', accent='com')    #creates an instance of the TextToSpeechTalker class

    #Now we're going to create a TkInter window to start and control playback:
    win=tk.Tk()                     #create a TkInter window
    win.title("Play/Pause Test")    #put a title on it
    PlayWindow(win, talker) #the window needs to know about the talker

    #Everything should be ready to go at this point so we enter the TkInter mainloop
    win.mainloop()

Oh, yeah; I forgot about that. So, I thought that I initialized PyGame in the talker, but I forgot that I moved the pygame.init() down to the main part of the program. My logic was that if you're using PyGame you're probably initializing it in your main program because you're using it for other stuff and wouldn't initialize it in a module like that.

Then we instantiate the talker object, create the TkInter window, and pass the talker object to the PlayWindow class. Because we're using TkInter, control of the program is event-driven and we can jump straight into TkInter's .mainloop().

If you were writing a game which used both TkInter and PyGame, one of your PyGame classes would begin processing a PyGame event loop and probably PyGame's FPS clock, etc.

When you run the program you'll see one main button, which by default says "Gordon Freeman", and below that three control buttons for stop, pause, and resume. The stop button isn't implemented. Click the "Gordon Freeman" button to begin the speech, and you'll see that you can pause and resume playback of that list of speech at will.

In conclusion, this completely demonstrates the conversion of text-to-speech using the gTTS module, and that pausing and resuming the speech playback works perfectly.

Here is the complete code listing:

#Gordon Freeman is our hero!
import tkinter as tk
import pygame
import threading
from io import BytesIO
from gtts import gTTS


#create a TkInter class to hold our 'play' and 'pause' buttons
#the canonical way to do this is to subclass it from a tkinter.Frame() object
class PlayWindow(tk.Frame):
    '''This is the play window to test the pause function.'''
    def __init__(self, master, talker):
        '''Initialize the PlayWindow'''
        super(PlayWindow, self).__init__(master)    #init the super Frame
        self.master = master    #sometimes we need this
        self.talker = talker    #the talker object
        
        #creates and packs the "Gordon Freeman" button
        tk.Button(master, text="Gordon Freeman", command=self.dispatchGordonFreeman).pack()
        # tk.Button(master, text="Words", command=self.dispatchWords).pack()
        
        #creates and packs the stop, pause, and play/resume buttons
        tk.Button(master, text=u'\u25A0', command=self.stop).pack(side=tk.LEFT, expand=True, fill=tk.BOTH)
        tk.Button(master, text=u'\u275A\u275A', command=self.pause).pack(side=tk.LEFT, expand=True, fill=tk.BOTH)
        tk.Button(master, text=u'\u25B6', command=self.resume).pack(side=tk.LEFT, expand=True, fill=tk.BOTH)

    def dispatchWords(self):
        '''Dispatches the 'words()' method as a thread to free up the TkInter interface.'''
        x = threading.Thread(target=self.GCIreland, daemon=True)
        x.start()
    
    def GCIreland(self):
        '''Creates a list of words then sends them one at a time to the TTS talker.'''
        words = ["one", "boy", "girl", "man", "woman", "two"]
        for i in words:
            talker.say(i)

    def dispatchGordonFreeman(self):
        '''Dispatches the 'gordonfreeman()' method as a thread to free up the TkInter interface.'''
        x = threading.Thread(target=self.gordonFreeman, daemon=True)
        x.start()
    
    def gordonFreeman(self):
        '''Creates a list with a well-known speech and hands it off to the Text-To-Speech talker.'''
        speech = []

        speech.append("Good morning, and welcome to the Black Mesa transit system.")
        speech.append("This automated train is provided for the security and convenience of the Black Mesa Research Facility personnel.")
        speech.append("The time is 8:47 A.M.")
        speech.append("Current topside temperature is 93 degrees with an estimated high of 105.")
        speech.append("The Black Mesa compound is maintained at a pleasant 68 degrees at all times.")
        speech.append("This train is inbound from level 3 dormitories to sector C test labs and control facilities.")
        speech.append("If your intended destination is a high security area beyond sector C, you will need to return to the central transit hub in area 9 and board a high security train.")
        speech.append("If you have not yet submitted your identity to the retinal clearance system, you must report to Black Mesa personnel for processing before you will be permitted into the high security branch of the transit system.")

        speech.append("Due to the high toxicity of material routinely handled in the Black Mesa compound, no smoking, eating, or drinking are permitted within the Black Mesa transit system.")
        speech.append("Please keep your limbs inside the train at all times.")
        speech.append("Do not attempt to open the doors until the train has come to a complete halt at the station platform.")
        speech.append("In the event of an emergency, passengers are to remain seated and await further instruction.")
        speech.append("If it is necessary to exit the train, disabled personnel should be evacuated first. Please, stay away from electrified rails and proceed to an emergency station until assistance arrives.")
        
        for i in speech:
            talker.say(i)
            
    def stop(self):
        '''Completely stops playback.  (Not implemented.)'''
        pass
    
    def resume(self):
        '''Hopefully this resumes playback from whence it paused.'''
        self.talker.resume()
    
    def pause(self):
        '''With any luck at all this will pause playback in a way that it can be resumed.  I dunno -- your guess is as good as mine!'''
        self.talker.pause()



import pygame
import threading
from io import BytesIO
from gtts import gTTS

#create a custom TextToSpeechTalker class to convert text to speech and play it
#   through PyGame's mixer.  Really, the text-to-speech conversion should not
#   really be done in this class as it is unrelated to PyGame.
class TextToSpeechTalker():
    def __init__(self, *, language="en", accent="com"):
        '''Initializes the Google gTTS Text-To-Speech module.'''
        self.language = language    #desired language
        self.accent = accent        #desired locaization accent
        self.phraseList = []        #a list to hold a bunch of text
        self.threadRunning = False  #boolean flag True when thread is running
        self.paused = False         #flag that knows if we have paused

        pygame.mixer.init()         #initialize pygame's sound mixer

    def say(self, text):
        '''Converts text to speech, puts it in a buffer, then dispatches the talker
        as a thread.'''
        self.phraseList.append(text)    #append the text to the list
        if not self.threadRunning:
            self.dispatchTalkerThread() #dispatch a thread to "say" it
        
    def dispatchTalkerThread(self):
        '''Handles dispatching the talker method as a thread.'''
        #don't actually need to dispatch it if it's already running
        if not self.threadRunning:
            self.threadTalker = threading.Thread(target=self.talk, daemon=True)
            self.threadRunning = True
            self.threadTalker.start()
            self.threadTalker.join()    #wait for the thread to terminate
            self.threadRunning = False
        
    def talk(self):
        '''This plays every entry in our list of text.  It is dispatched as a thread.'''
        #this while-loop loops through all entries in the phraseList
        while self.phraseList:      #boolean way of checking that there is something in the list
            mp3_fp = BytesIO()          #stream buffer to hold mp3 data
            phrase = gTTS(text=self.phraseList.pop(0), lang=self.language, tld=self.accent) #creates the TTS object
            phrase.write_to_fp(mp3_fp)  #write binary data to the stream buffer
            pygame.mixer.music.load(mp3_fp,"mp3")   #load the stream into mixer
            pygame.mixer.music.play()               #start playing the mp3
            
            #this is here to prevent multiple mp3s being played simultaneously
            #it won't start playing the next buffer until the current one is complete
            #   it just waits for the mixer to become not busy
            while True:
                pygame.time.wait(100)   #arbitrary delay (1/10th second to be nice)
                pygame.event.clear()    #keep the event buffer clear for now
                if not self.paused and not pygame.mixer.music.get_busy():
                    break
    
    def pause(self):
        '''Pauses the mixer output.'''
        pygame.mixer.music.pause()      #pause playback
        self.paused = True              #make sure we know we're paused
    
    def resume(self):
        '''Resumes mixer output.'''
        pygame.mixer.music.unpause()
        self.paused = False

if __name__ == "__main__":
    #We're going to start by initializing PyGame
    #   this should be done here and not in the talker object
    pygame.init()

    #Now initializing a talker object:
    talker = TextToSpeechTalker(language='en', accent='com')    #creates an instance of the TextToSpeechTalker class

    #Now we're going to create a TkInter window to start and control playback:
    win=tk.Tk()                     #create a TkInter window
    win.title("Play/Pause Test")    #put a title on it
    PlayWindow(win, talker) #the window needs to know about the talker

    #Everything should be ready to go at this point so we enter the TkInter mainloop
    win.mainloop()
  • Related