Average of the values in a dictionary-CodePudding

I have a dictionary called model_scores_for_datasets that looks like this:

{'Unprocessed': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '1.000', 'K-Nearest Neighbour': '1.000', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Standardisation': {'Logistic Regression': '0.933', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Normalisation': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Rescale': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}}
{'Unprocessed': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '1.000', 'K-Nearest Neighbour': '1.000', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Standardisation': {'Logistic Regression': '0.933', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Normalisation': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Rescale': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}}

I want to get the average for each, dictionary in the list of dictionaries. There are 4 total "Unprocessed Standardisation Normalisation Rescale" and 8 total metrics for each that look like this:

{'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '1.000', 'K-Nearest Neighbour': '1.000', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}

So each of the 4 scales have 8 different ML altos and I want to get an average to say that for example on average "standardisation" scored the highest so it will be used during the machine learning process.

This is the code, but it giving me an error: TypeError: can't convert type 'str' to numerator/denominator


avgDict = model_scores_for_datasets
for st,vals in avgDict.items():
    print(st,(vals))
    #print (st)
    for st,vals in avgDict.items():
        print("Average for {} is {}".format(st,mean(vals)))

CodePudding user response：

import numpy as np
for mode in results.keys():
    mean = np.mean([float(value) for value in results[mode].values()])
    print(f"{mode}: {mean}")

Out:

Unprocessed: 0.9624999999999999
Standardisation: 0.9542499999999999
Normalisation: 0.9584999999999999
Rescale: 0.9542499999999999

For PythonCrazy

print({mode: np.mean([float(value) for value in results[mode].values()]) for mode in results.keys()})

CodePudding user response：

first you have to convert to the right type:

avgDict = model_scores_for_datasets
#conversion
avgDict=dict(zip(avgDict.keys(),list(map(float,avgDict.keys())))

for st,vals in avgDict.items():
    print(st,(vals))
    #print (st)
    for st,vals in avgDict.items():
        print("Average for {} is {}".format(st,mean(vals)))

output:

Average for Logistic Regression is 0.967
Average for Support Vector Machine is 0.967
Average for Decision Tree is 0.933
Average for Random Forest is 0.933
Average for LinearDiscriminant is 1.0
Average for K-Nearest Neighbour is 1.0
Average for Naive Bayes is 0.967
Average for XGBoost is 0.933

CodePudding user response：

One easy-to-read solution would be:

data = {'Unprocessed': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '1.000', 'K-Nearest Neighbour': '1.000', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Standardisation': {'Logistic Regression': '0.933', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Normalisation': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Rescale': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}}

dicts = list(data.keys())
keys = list(data['Unprocessed'].keys())

r = {}
for k in keys:
    r[k] = sum([float(data[d][k]) for d in dicts])/len(dicts)
    
print(r)
#{'Logistic Regression': 0.9585, 'Support Vector Machine': 0.967, 'Decision Tree': 0.933, 'Random Forest': 0.95, 'LinearDiscriminant': 0.9752500000000001, 'K-Nearest Neighbour': 0.9752500000000001, 'Naive Bayes': 0.967, 'XGBoost': 0.933}

Similarly, if you want to average by dictionary:

r2 = {}
for d in dicts:
    r2[d] = sum([float(data[d][k]) for k in keys])/len(keys)
    
print(r2)
#{'Unprocessed': 0.9624999999999999, 'Standardisation': 0.9542499999999999, 'Normalisation': 0.9584999999999999, 'Rescale': 0.9542499999999998}