I would like to create a confusion matrix, without dependency on any packages. I have two lists (predicted values, and actual values), and I want to input these along with an indicator of the positive class into a function.
For example, when 1
is the positive class:
predicted_lst = [1, 0, 1, 0, 0]
actual_lst = [1, 0, 0, 1, 1]
My function currently looks like this, but it is inefficient:
def confusion_matrix(predicted, actual, pos_class):
TP = 0
TN = 0
FP = 0
FN = 0
for i in range(len(actual)):
if actual[i] == pos_class and predicted[i] == pos_class:
TP =1
elif actual[i] == pos_class and predicted[i] != pos_class:
FN =1
elif actual[i] != pos_class and predicted[i] == pos_class:
FP =1
else:
TN =1
return TP, FP, TN, FN
My question is, is there a more efficient way to write this code? I saw these posts, but they do not take a positive class as a function input, as I hope to do. I also do not want to use any packages at all (including numpy)
CodePudding user response:
You can write the function assuming that 1
is the positive class and accomodate the pos_class
parameter by simply changing the order of the return values accordingly if 0
is the positive class.
Inside the loop, you could drop the incremental calculation of FP
and FN
, because these can be derived from the true values outside of the loop:
def confusion_matrix(predicted, actual, pos_class):
TP = 0
TN = 0
for pred, act in zip(predicted, actual):
if pred == act:
if act == 0:
TN = 1
else:
TP = 1
positive = sum(predicted)
negative = len(predicted) - positive
FP = positive - TP
FN = negative - TN
if pos_class == 1:
return TP, FP, TN, FN
else:
return TN, FN, TP, FP
CodePudding user response:
See below for a couple of alternative solutions.
Option 1:
def confmat_1(actual, predicted, positive, negative):
tn = len([x for x in zip(predicted, actual) if x[0] == negative and x[1] == negative])
fp = len([x for x in zip(predicted, actual) if x[0] == positive and x[1] == negative])
fn = len([x for x in zip(predicted, actual) if x[0] == negative and x[1] == positive])
tp = len([x for x in zip(predicted, actual) if x[0] == positive and x[1] == positive])
return tn, fp, fn, tp
Option 2:
def confmat_2(actual, predicted, positive, negative):
tn = 0
fp = 0
fn = 0
tp = 0
for x in zip(predicted, actual):
if x[0] == negative and x[1] == negative:
tn = 1
elif x[0] == positive and x[1] == negative:
fp = 1
elif x[0] == negative and x[1] == positive:
fn = 1
else:
tp = 1
return tn, fp, fn, tp
Example:
from sklearn.metrics import confusion_matrix
actual = [1, 0, 0, 1, 1]
predicted = [1, 0, 1, 0, 0]
# Option 1
tn, fp, fn, tp = confmat_1(actual, predicted, positive=1, negative=0)
print(tn, fp, fn, tp)
# 1 1 2 1
# Option 2
tn, fp, fn, tp = confmat_2(actual, predicted, positive=1, negative=0)
print(tn, fp, fn, tp)
# 1 1 2 1
# Scikit-learn
tn, fp, fn, tp = confusion_matrix(actual, predicted).ravel()
print(tn, fp, fn, tp)
# 1 1 2 1