I'm trying out speech recognition and using it as input for some statements while having the program "speak" back to me using the playsound and gTTS modules. But I have ran into an issue that I can't find the solution for, I tried the most common solutions but with no luck.
The program uses the playsound
, speech_recognition
, and gTTS
modules and two functions; speak()
lets the program speak back to the user using google's text to sound translation, and get_audio()
that receives input from the user's microphone using speech_recognition
's recognizer and microphone classes.
import os
import time
import playsound
import speech_recognition as sr
from gtts import gTTS
run = True
def speak(text):
tts = gTTS(text=text, lang="en")
filename = "voice.mp3"
tts.save(filename)
playsound.playsound(filename)
def get_audio():
r = sr.Recognizer()
with sr.Microphone() as source:
audio = r.listen(source)
said = ""
try:
said = r.recognize_google(audio)
print(said)
except Exception as e:
print("Exception: " str(e))
return said
while run == True:
text = get_audio()
if "hello" in text:
speak("Hello, how are you?")
if "what are you" in text:
print("")
speak("I am a speech recognition program")
if "goodbye" in text:
speak("Talk to you later" or "Bye" or "Cya")
run = False
I have the program set up with a while loop so a conversation can play out, and it only breaks once the user says "Goodbye". The problem seems to be that the .mp3 file (voice.mp3
which is what the speak()
function uses to store audio for the program to play back) can't be accessed after its creation. Both the python file and mp3 file are stored within the same folder.
Here is the error message in full:
who are you
hello
Traceback (most recent call last):
File "c:\Users\User\OneDrive\Documents\VS Code Projects\Speech Recognition\main_va.py", line 34, in <module>
speak("Hello, how are you?")
File "c:\Users\User\OneDrive\Documents\VS Code Projects\Speech Recognition\main_va.py", line 12, in speak
tts.save(filename)
File "C:\Python\Python310\lib\site-packages\gtts\tts.py", line 328, in save
with open(str(savefile), "wb") as f:
PermissionError: [Errno 13] Permission denied: 'voice.mp3'
PS C:\Users\User\OneDrive\Documents\VS Code Projects\Speech Recognition>
I received a response on the first call ("who are you"), but then the error message popped up after the second call ("hello").
Specs: python 3.10.4
- playsound 1.2.2
- Rest is up to date
CodePudding user response:
I found a solution that seems to work just fine; I delete the .mp3 file each time after I use it, so at the end of the speak()
function I just use os.remove(filename)
and then the next time it wants to say something a new file is created.
I found some other solutions saying that you should rename the filename every time you make one, but that would make too much clutter for me.
Here is the change that I made to my code, it was just a single line within the speak()
function:
def speak(text):
tts = gTTS(text=text, lang="en")
filename = "voice.mp3"
tts.save(filename)
playsound.playsound(filename)
os.remove("voice.mp3")
This works perfectly for me so far, it can take in as many inputs as needed since the file is deleted and recreated every time the speak()
function is used.
Again if a better and more efficient solution is suggested or found, I'll gladly take it.
CodePudding user response:
Your solution works fine.
I just would tweak it to leave the file behind (in case you want to listen to it for testing purposes) and instead, remove it at the beginning if it exists.
Also passing filename to your function ensures nothing is hard coded
def speak(text, filename):
if os.path.exists(filename):
os.remove(filename)
tts = gTTS(text=text, lang="en")
tts.save(filename)
playsound.playsound(filename)