Python calculate mean values of specific key in a nested dictionary (IBM Watson Speech to Text API r-CodePudding

I am comparing baseline confidence for IBM Watson text to speech across several audio files. I can access the confidence level of a single record using pprint(data_response['results'][0]['alternatives'][0]['confidence']) but can't return multiple confidence levels. I need to calculate the mean confidence level of the entire transcript. I've looked into iterating over the nested dict, but everywhere I've read so far says that only returns the keys and not the values.

What method should be used to get a mean of all confidence levels?

Here is what the nested dictionary looks like using pretty print:

{'result_index': 0,
 'results': [{'alternatives': [{'confidence': 0.99, 'transcript': 'hello '}],
              'final': True},
             {'alternatives': [{'confidence': 0.9,
                                'transcript': 'good morning any this is '}],
              'final': True},
             {'alternatives': [{'confidence': 0.59,
                                'transcript': "I'm on a recorded morning "
                                              '%HESITATION today start running '
                                              "yeah it's really good how are "
                                              "you %HESITATION it's one three "
                                              'six thank you so much for '
                                              'asking '}],
              'final': True},
             {'alternatives': [{'confidence': 0.87,
                                'transcript': 'I appreciate this opportunity '
                                              'to get together with you and '
                                              '%HESITATION you know learn more '
                                              'about you your interest in '}],
              'final': True},

CodePudding user response：

You can use statistics.mean to calculate the mean of all confidence levels:

from statistics import mean

data_response = {
    "result_index": 0,
    "results": [
        {
            "alternatives": [{"confidence": 0.99, "transcript": "hello "}],
            "final": True,
        },
        {
            "alternatives": [
                {"confidence": 0.9, "transcript": "good morning any this is "}
            ],
            "final": True,
        },
        {
            "alternatives": [
                {
                    "confidence": 0.59,
                    "transcript": "I'm on a recorded morning "
                    "%HESITATION today start running "
                    "yeah it's really good how are "
                    "you %HESITATION it's one three "
                    "six thank you so much for "
                    "asking ",
                }
            ],
            "final": True,
        },
        {
            "alternatives": [
                {
                    "confidence": 0.87,
                    "transcript": "I appreciate this opportunity "
                    "to get together with you and "
                    "%HESITATION you know learn more "
                    "about you your interest in ",
                }
            ],
            "final": True,
        },
    ],
}

m = mean(
    a["confidence"] for r in data_response["results"] for a in r["alternatives"]
)
print(m)

Prints:

0.8375