Get most common match for each unique item in two arrays-CodePudding

I have data that is similar to these two arrays:

predicted_class = ['A','B','C','A','B','A','B','C','A']
true_class_____ = ['A','B','C','A','B','C','A','B','C']

I would like to find the number of classes that are correctly predicted once the majority consensus is taken - e.g my data shows predictions for 'A' = 66% correct, 'B' = 66% correct, 'C' = 33% correct, so overall accuracy would be 66% given the most common prediction for class 'A' and 'B' are correct, but 'C' isn't.

CodePudding user response：

From what you write in the example and comments, it looks like you are looking for the maximum of the correct-to-all prediction ratio for each class.

Here is one way of doing so using collections.Counter:

import collections


def max_model_match(true, predicted):
    # count all occurrences of the classes
    counter_all = collections.Counter(true)
    # initialize the "correct" or "good" counter
    counter_good = counter_all.copy()
    counter_good.clear()
    # loop through all outcomes
    for (x, y) in zip(true, predicted):
        # if the prediction is correct increment the counter
        if x == y:
            counter_good[x]  = 1
    # find the maximum correct-to-all ratio
    max_good_ratio = 0.0
    for key in counter_all.keys():
        good_ratio = counter_good[key] / counter_all[key]
        if good_ratio > max_good_ratio:
             max_good_ratio = good_ratio
    return max_good_ratio


predicted_class = ['A','B','C','A','B','A','B','C','A']
true_class      = ['A','B','C','A','B','C','A','B','C']
max_model_match(true_class, predicted_class)
# 0.6666666666666666

CodePudding user response：

A simple approach using a defaultdict and max:

predicted_class = ['A','B','C','A','B','A','B','C','A']
true_class      = ['A','B','C','A','B','C','A','B','C']

from collections import defaultdict
d = defaultdict(lambda : [0, 0]) # [total, correct]
for p,t in zip(predicted_class, true_class):
    d[t][0]  = 1
    if p == t:
        d[t][1]  = 1

# max value
max(n/t for t,n in d.values())

output: 0.666666666