Home > Software engineering >  Get most common match for each unique item in two arrays
Get most common match for each unique item in two arrays

Time:04-12

I have data that is similar to these two arrays:

predicted_class = ['A','B','C','A','B','A','B','C','A']
true_class_____ = ['A','B','C','A','B','C','A','B','C']

I would like to find the number of classes that are correctly predicted once the majority consensus is taken - e.g my data shows predictions for 'A' = 66% correct, 'B' = 66% correct, 'C' = 33% correct, so overall accuracy would be 66% given the most common prediction for class 'A' and 'B' are correct, but 'C' isn't.

CodePudding user response:

From what you write in the example and comments, it looks like you are looking for the maximum of the correct-to-all prediction ratio for each class.

Here is one way of doing so using collections.Counter:

import collections


def max_model_match(true, predicted):
    # count all occurrences of the classes
    counter_all = collections.Counter(true)
    # initialize the "correct" or "good" counter
    counter_good = counter_all.copy()
    counter_good.clear()
    # loop through all outcomes
    for (x, y) in zip(true, predicted):
        # if the prediction is correct increment the counter
        if x == y:
            counter_good[x]  = 1
    # find the maximum correct-to-all ratio
    max_good_ratio = 0.0
    for key in counter_all.keys():
        good_ratio = counter_good[key] / counter_all[key]
        if good_ratio > max_good_ratio:
             max_good_ratio = good_ratio
    return max_good_ratio


predicted_class = ['A','B','C','A','B','A','B','C','A']
true_class      = ['A','B','C','A','B','C','A','B','C']
max_model_match(true_class, predicted_class)
# 0.6666666666666666

CodePudding user response:

A simple approach using a defaultdict and max:

predicted_class = ['A','B','C','A','B','A','B','C','A']
true_class      = ['A','B','C','A','B','C','A','B','C']

from collections import defaultdict
d = defaultdict(lambda : [0, 0]) # [total, correct]
for p,t in zip(predicted_class, true_class):
    d[t][0]  = 1
    if p == t:
        d[t][1]  = 1

# max value
max(n/t for t,n in d.values())

output: 0.666666666

  • Related