I'm looping and creating tn, fp, fn, tp for each dataset I have and for some datasets there's only 0's and I predict only 0's so I only return a 1x1 array for tp but I still want a 2x2 matrix returned so I don't get ValueError: not enough values to unpack (expected 4, got 1)
during the following bit of python:
tn, fp, fn, tp = confusion_matrix(metrics_data[label_column],metrics_data[scored_column]).ravel()
What's the best way to fix this?
CodePudding user response:
Add the labels parameter to your confusion matrix command, eg
tn, fp, fn, tp = confusion_matrix(
metrics_data[label_column],
metrics_data[scored_column],
labels=[0, 1]).ravel()
From the documentation for sklearn.metris.confustion_matrix
, labels
is an array-like of shape (n_classes) and defined as:
List of labels to index the matrix. This may be used to reorder or select a subset of labels. If None is given, those that appear at least once in y_true or y_pred are used in sorted order.
Since you have provided None, confuision_matrix
is defaulting to only the values it has actually seen in your data.
CodePudding user response:
from the documentation:
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
sklearn.metrics.confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)[source] Compute confusion matrix to evaluate the accuracy of a classification. By definition a confusion matrix is such that is equal to the number of observations known to be in group and predicted to be in group . Thus in binary classification, the count of true negatives is , false negatives is , true positives is and false positives is . Read more in the User Guide. Parameters y_truearray-like of shape (n_samples,) Ground truth (correct) target values. y_predarray-like of shape (n_samples,) Estimated targets as returned by a classifier. labelsarray-like of shape (n_classes), default=None List of labels to index the matrix. This may be used to reorder or select a subset of labels. If None is given, those that appear at least once in y_true or y_pred are used in sorted order.
you can use the labels
argument to enforce equal size, regardless of input predictions:
confusion_matrix(true, pred, labels=[1,2,3])