Writing a confusion matrix function taking positive class as an input-CodePudding

I would like to create a confusion matrix, without dependency on any packages. I have two lists (predicted values, and actual values), and I want to input these along with an indicator of the positive class into a function.

For example, when 1 is the positive class:

predicted_lst = [1, 0, 1, 0, 0]
actual_lst = [1, 0, 0, 1, 1]

My function currently looks like this, but it is inefficient:

def confusion_matrix(predicted, actual, pos_class):
    
    TP = 0
    TN = 0 
    FP = 0
    FN = 0
    
    for i in range(len(actual)):
        if actual[i] == pos_class and predicted[i] == pos_class:
            TP  =1
        elif actual[i] == pos_class and predicted[i] != pos_class:
            FN  =1
        elif actual[i] != pos_class and predicted[i] == pos_class:
            FP  =1
        else:
            TN  =1
    return TP, FP, TN, FN

My question is, is there a more efficient way to write this code? I saw these posts, but they do not take a positive class as a function input, as I hope to do. I also do not want to use any packages at all (including numpy)

CodePudding user response：

You can write the function assuming that 1 is the positive class and accomodate the pos_class parameter by simply changing the order of the return values accordingly if 0 is the positive class.

Inside the loop, you could drop the incremental calculation of FP and FN, because these can be derived from the true values outside of the loop:

def confusion_matrix(predicted, actual, pos_class):
    
    TP = 0
    TN = 0 
    
    for pred, act in zip(predicted, actual):
        if pred == act:
            if act == 0:
                TN  = 1
            else:
                TP  = 1
    
    positive = sum(predicted)
    negative = len(predicted) - positive        
    FP = positive - TP 
    FN = negative - TN
    
    if pos_class == 1:
        return TP, FP, TN, FN
    else:
        return TN, FN, TP, FP

CodePudding user response：

See below for a couple of alternative solutions.

Option 1:

def confmat_1(actual, predicted, positive, negative):

    tn = len([x for x in zip(predicted, actual) if x[0] == negative and x[1] == negative])
    fp = len([x for x in zip(predicted, actual) if x[0] == positive and x[1] == negative])
    fn = len([x for x in zip(predicted, actual) if x[0] == negative and x[1] == positive])
    tp = len([x for x in zip(predicted, actual) if x[0] == positive and x[1] == positive])

    return tn, fp, fn, tp

Option 2:

def confmat_2(actual, predicted, positive, negative):

    tn = 0
    fp = 0
    fn = 0
    tp = 0

    for x in zip(predicted, actual):

        if x[0] == negative and x[1] == negative:
            tn  = 1

        elif x[0] == positive and x[1] == negative:
            fp  = 1

        elif x[0] == negative and x[1] == positive:
            fn  = 1

        else:
            tp  = 1

    return tn, fp, fn, tp

Example:

from sklearn.metrics import confusion_matrix

actual = [1, 0, 0, 1, 1]
predicted = [1, 0, 1, 0, 0]

# Option 1
tn, fp, fn, tp = confmat_1(actual, predicted, positive=1, negative=0)
print(tn, fp, fn, tp)
# 1 1 2 1

# Option 2
tn, fp, fn, tp = confmat_2(actual, predicted, positive=1, negative=0)
print(tn, fp, fn, tp)
# 1 1 2 1

# Scikit-learn
tn, fp, fn, tp = confusion_matrix(actual, predicted).ravel()
print(tn, fp, fn, tp)
# 1 1 2 1