Home > Enterprise >  Azure Cognitive Speech-to-text DetailedSpeechRecognitionResult is not detecting explicit punctuation
Azure Cognitive Speech-to-text DetailedSpeechRecognitionResult is not detecting explicit punctuation

Time:09-07

when I using confidence the punctuation is not working just like I am saying question mark it was typing question mark instant ? and when I say period it was typing period instant . I have make a checkbox when you click on the checkbox the punctuation will be on

SpeechConfig config = SpeechConfig.FromSubscription("key", "region");
config.OutputFormat = OutputFormat.Detailed;
if (Properties.Settings.Default.Punctuation)
{
    config.SetServiceProperty("punctuation", "explicit", ServicePropertyChannel.UriQueryParameter);
}
recognizer = new SpeechRecognizer(config);
recognizer. Recognizer. Recognizedecognizer_Recognized;
 
...

private void SpeechRecognizer_Recognized(object sender, SpeechRecognitionEventArgs e)
{
    if (e.Result.Reason == ResultReason.RecognizedSpeech)
    {
        if (e.Result.Text.ToLower().Equals("new line") || e.Result.Text.ToLower().Equals("newline"))
        {
            SendKeys.SendWait(Environment.NewLine);
        }
        else
        {
            var detailedResults = e.Result.Best();
            if (detailedResults != null && detailedResults.Any())
            {
               
                var bestResults = detailedResults?.ToList()[0];
                foreach (var word in bestResults.Words)
                {
                    double per = word.Confidence * 100;
                    SendKeys.SendWait($"{word.Word} [{per:0.##}] ");
                }

            }
        }
    }
}

CodePudding user response:

What you are observing is by design. In most circumstances it not necessary or even helpful to inspect the details of recognized speech result. It looks like you have misinterpreted how to use the details.

You don't realise it but your example of detecting "new line" or "newline" as a key phrase and interpreting that as a request to inject a line feed into the output is the very same process at work.

For puntuation to be detected in the speech, the first thing that the classifier must do is resolve the words. It is only after the word has been resolved that the service can post process the results to classify the word as a natural word or punctuation.

The process is a bit like this:

  1. Detected the word "comma" with high confidence
  2. If the punctuation setting is set to explicit, then Is the word on its own or at the end of a recognized sequence that was followed by a pause
  3. If yes, then interpret it as "," and not "comma"

For this reason it is important to understand that when the punctuation setting is set to explicit, the punctuation must be isolated out of the normal sentence cadence of the spoken text.

Read this as a sentence with a constant pace without punctuation:

this is a sentence that doesn't have a comma or a full stop but an exclamation mark would look nice

If you read fast and fluent enough, there should be no punctation in the output, even if the words were recognized with high confidence. To get punctuation into the same text, you actually need to read this script:

This is a sentence that doesn't have a comma.
Comma.
Or a fullstop.
Comma.
But an exclamation mark would look nice.
exclamation mark.

 This is a sentence that doesn't have a comma , or a full stop , but an exclamation mark would look nice !

The per-word analysis for my test looks like this:

word confidence
this 85.99%
is 95.93%
a 68.49%
sentence 96.99%
that 90.03%
doesn't 96.75%
have 94.57%
a 87.88%
comma 94.58%
comma 94.34%
or 67.14%
a 64.68%
fullstop 77.63%
comma 94.90%
but 91.17%
an 62.65%
exclamation 98.44%
mark 68.58%
would 86.15%
look 91.58%
nice 97.40%
exclamation 97.05%
mark 96.61%

Notice that the words representing the punctuation all have a high confidence rating, but in the output not all of the words were actually interpreted as punctuation. This might be clearer in this screenshot where I have highlighted two commas that are in the output, but are correctly identified as words:

example output of text vs words

CodePudding user response:

Using cognitive services I cannot reproduce your issue. Setting the config.OutputFormat = OutputFormat.Detailed or config.RequestWordLevelTimestamps(); does not affect the explicit punctuation recognition.

What is not clear from your example is the current state of your setting. When in doubt, if we are toggling logic using settings, and the behaviour that we observe is the same even when we change the setting values then the obvious code to check is the setting value itself.

Please try to comment out your logic to toggle the punctuation like this:

//if (Properties.Settings.Default.Punctuation)
{
    config.SetServiceProperty("punctuation", "explicit", ServicePropertyChannel.UriQueryParameter);
}

If this solves it then there are two considerations:

  1. What is the initial state of the Properties.Settings.Default.Punctuation setting? Is your application logic not updating the value when you expect it to? Any mutating logic that affects that setting may need to call Properties.Settings.Default.Save() to save changes. An extension of this of course is that depending on where your mutating logic is executing from, you might need to call Properties.Settings.Default.Reload() to ensure that the current values are loaded from the store, however this is not usually required if you are operating in the same thread space, which you most likely will be in WinForms.

  2. Is the config loaded once, and is that once before the setting value has been toggled? That step in the workflow is unclear from your description and the code example. If you are using continuous recognition or you are creating a single instances of SpeechRecognizer for the lifetime of your Form then changes to your setting will not be applied into the Speech Configuration.

    You will need to re-initialize the SpeechRecognizer as part of your logic that is handling the setting changed event or have some other routine in the speech event handlers that detects a change in this setting and restarts the SpeechRecognizer connection and process.

  • Related