I am trying to create a video in IOS with Text-to-speech (like TikTok does). The only way to do this that I thought was to merge a video and an audio with AVFoundations, but it seems impossible to insert the audio of a text-to-speech into a .caf file.
This is what I tried:
public async Task amethod(string[] _text_and_position)
{
string[] text_and_position = (string[])_text_and_position;
double tts_starting_position = Convert.ToDouble(text_and_position[0]);
string text = text_and_position[1];
var synthesizer = new AVSpeechSynthesizer();
var su = new AVSpeechUtterance(text)
{
Rate = 0.5f,
Volume = 1.6f,
PitchMultiplier = 1.4f,
Voice = AVSpeechSynthesisVoice.FromLanguage("en-us")
};
synthesizer.SpeakUtterance(su);
Action<AVAudioBuffer> buffer = new Action<AVAudioBuffer>(asss);
try
{
synthesizer.WriteUtterance(su, buffer);
}
catch (Exception error) { }
}
public async void asss(AVAudioBuffer _buffer)
{
try
{
var pcmBuffer = (AVAudioPcmBuffer)_buffer;
if (pcmBuffer.FrameLength == 0)
{
// done
}
else
{
AVAudioFile output = null;
// append buffer to file
NSError error;
if (output == null)
{
string filePath = Path.Combine(Path.GetTempPath(), "TTS/" 1 ".caf");
NSUrl fileUrl = NSUrl.FromFilename(filePath);
output = new AVAudioFile(fileUrl, pcmBuffer.Format.Settings, AVAudioCommonFormat.PCMInt16 , false ,out error);
}
output.WriteFromBuffer(pcmBuffer, out error);
}
}
catch (Exception error)
{
new UIAlertView("Error", error.ToString(), null, "OK", null).Show();
}
}
This is the same code in objective-c
let synthesizer = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: "test 123")
utterance.voice = AVSpeechSynthesisVoice(language: "en")
var output: AVAudioFile?
synthesizer.write(utterance) { (buffer: AVAudioBuffer) in
guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
fatalError("unknown buffer type: \(buffer)")
}
if pcmBuffer.frameLength == 0 {
// done
} else {
// append buffer to file
if output == nil {
output = AVAudioFile(
forWriting: URL(fileURLWithPath: "test.caf"),
settings: pcmBuffer.format.settings,
commonFormat: .pcmFormatInt16,
interleaved: false)
}
output?.write(from: pcmBuffer)
}
}
The problem with this code is that "synthesizer.WriteUtterance(su, buffer);" always crashes, after reading other posts I believe this is a bug that results in the callback method (buffer) never being called.
Do you know of any workaround to this bug or any other way to achieve what I am trying to do?
Thanks for your time, have a great day.
EDIT: I commented synthesizer.SpeakUtterance(su); as ColeX pointed out and now the callback method is executed. Unfortunately, I can't store my audios in a file yet since I get another error in
output = new AVAudioFile(fileUrl, pcmBuffer.Format.Settings, AVAudioCommonFormat.PCMInt16 , false ,out error);
ERROR:
Could not initialize an instance of the type 'AVFoundation.AVAudioFile': the native 'initForWriting:settings:commonFormat:interleaved:error:' method returned nil. It is possible to ignore this condition by setting ObjCRuntime.Class.ThrowOnInitFailure to false.
CodePudding user response:
The error simply shows An AVSpeechUtterance shall not be enqueued twice
.
So stop making it speak and write in the same time .
I used your code and comment out synthesizer.SpeakUtterance(su);
, error gone .
Update
Based on my test , it does not allow to create extra subfolder , so remove the TTS/
part , just leave the file name alone .
string filePath = Path.Combine(Path.GetTempPath(), 1 ".caf");