How do I get a confusion matrix to output a consistently shaped array (2x2) for a binary classificat-CodePudding

I'm looping and creating tn, fp, fn, tp for each dataset I have and for some datasets there's only 0's and I predict only 0's so I only return a 1x1 array for tp but I still want a 2x2 matrix returned so I don't get ValueError: not enough values to unpack (expected 4, got 1) during the following bit of python:

tn, fp, fn, tp = confusion_matrix(metrics_data[label_column],metrics_data[scored_column]).ravel()

What's the best way to fix this?

CodePudding user response：

Add the labels parameter to your confusion matrix command, eg

tn, fp, fn, tp = confusion_matrix(
    metrics_data[label_column],
    metrics_data[scored_column], 
    labels=[0, 1]).ravel()

From the documentation for sklearn.metris.confustion_matrix, labels is an array-like of shape (n_classes) and defined as:

List of labels to index the matrix. This may be used to reorder or select a subset of labels. If None is given, those that appear at least once in y_true or y_pred are used in sorted order.

Since you have provided None, confuision_matrix is defaulting to only the values it has actually seen in your data.

CodePudding user response：

from the documentation:

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html

sklearn.metrics.confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)[source]
Compute confusion matrix to evaluate the accuracy of a classification.

By definition a confusion matrix  is such that  is equal to the number of observations known to be in group  and predicted to be in
group .

Thus in binary classification, the count of true negatives is , false negatives is , true positives is  and false positives is .

Read more in the User Guide.

Parameters
y_truearray-like of shape (n_samples,)
Ground truth (correct) target values.

y_predarray-like of shape (n_samples,)
Estimated targets as returned by a classifier.

labelsarray-like of shape (n_classes), default=None
List of labels to index the matrix. This may be used to reorder or select a subset of labels. If None is given, those that appear at
least once in y_true or y_pred are used in sorted order.

you can use the labels argument to enforce equal size, regardless of input predictions:

confusion_matrix(true, pred, labels=[1,2,3])