Azure Speech to Text word accuracy in % with custom model microsoft azure-CodePudding

I am making speech to text app in C# window form Microsoft Azure it was working fine and running in visual studio I want to make a custom model because just like 90% word is recognizing correctly but some word are being not recognized correctly like (Pneumoultramicroscopicsilicovolcanoconiosis). There's no reference in any of the documentation of how this is processed or how to prepare the testing data, nor the amount of data necessary to make this possible. How do you specify that a recognized word using the Azure Cognitive Service Speech Studio?

CodePudding user response：

If you want to improve recognition then you can specify your own model, or you can specify a phrase list.

If you need specific keywords for commands or a list of user/staff names then they are good candidates for a simple phrase list.

Medical diseases or diagnosis and some other highly technical terms often do not make good candidates for a phrase list as they often have latin or other non-English origins and as such the base English models are less likely to be trained on that content at all.

Improve recognition accuracy with phrase list
A phrase list is a list of words or phrases provided ahead of time to help improve their recognition. Adding a phrase to a phrase list increases its importance, thus making it more likely to be recognized.

Implement phrase list
With the Speech SDK you can add phrases individually and then run speech recognition. Then you can optionally clear or update the phrase list to take effect before the next recognition.

  var phraseList = PhraseListGrammar.FromRecognizer(recognizer);
  phraseList.AddPhrase("Pneumoultramicroscopicsilicovolcanoconiosis");

NOTE: This specific word will still not likely be detectable as a phrase, part of the issue is that it is a compound word formed from other words that have very low representation in standard English speech models. You would have to speak it with a constant cadence to be recognised as a single word instead of it's components, which is actually quite hard to master

Custom Speech is useful if there is a specific area of interest or business domain vocabulary that you want to model. However to do this will require you upload your own data, test and train the custom model.

What is Custom Speech?
With Custom Speech, you can evaluate and improve the Microsoft speech-to-text accuracy for your applications and products.

Out of the box, speech to text utilizes a Universal Language Model as a base model that is trained with Microsoft-owned data and reflects commonly used spoken language. The base model is pre-trained with dialects and phonetics representing a variety of common domains. When you make a speech recognition request, the most recent base model for each supported language is used by default. The base model works very well in most speech recognition scenarios.

A custom model can be used to augment the base model to improve recognition of domain-specific vocabulary specific to the application by providing text data to train the model. It can also be used to improve recognition based for the specific audio conditions of the application by providing audio data with reference transcriptions.

There is too much to cover even a simple implementation of Custom Speech here, however you can see the documentation around this topic is quite detailed, as is standard with all the Microsoft Azure services.