Speech to text modification-CodePudding

I’m working on a modified speech to text feature that should take in a users speech and convert it to text but I want the output text to be exactly what the user is saying. This means I want to detect word disfluency’s such as stammers like “sstttop” and “pppplease”. Ive already written a Java program that does the speech to text but I need to know if it’s possible to modify it to detect speech disfluency. Any input and help would be much appreciated.

CodePudding user response：

I think it's better to improve the structure of the text from the speech delivered by stammer

CodePudding user response：

My first guess would be that you would have to analyze the time that a user spends producing each specific sound. For example, one S could be the 's' sound for half a second whereas two 's's could be represented by the user producing the sound for one second. I understand that this is not completely accurate but best guess I can think of.