Home > Net >  Calculating Confusion Matrix by Using the Array of Arrays
Calculating Confusion Matrix by Using the Array of Arrays

Time:07-28

I am using transformers and datasets libraries to train an multi-class nlp model for real specific dataset and I need to have an idea how my model performs for each label. So, I'd like to calculate the confusion matrix. I have 4 labels. My result.prediction looks like

array([[ -6.906 ,  -8.11  , -10.29  ,   6.242 ],
       [ -4.51  ,   3.705 ,  -9.76  ,  -7.49  ],
       [ -6.734 ,   3.36  , -10.27  ,  -6.883 ],
       ...,
       [  8.41  ,  -9.43  ,  -9.45  ,  -8.6   ],
       [  1.3125,  -3.094 , -11.016 ,  -9.31  ],
       [ -7.152 ,  -8.5   ,  -9.13  ,   6.766 ]], dtype=float16)

In here when predicted value is positive then model predicts 1, else model predicts 0. Next my result.label_ids looks like

array([[0., 0., 0., 1.],
       [1., 0., 0., 0.],
       [0., 0., 0., 1.],
       ...,
       [1., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 0., 0., 1.]], dtype=float32)

As you can see model return an array of 4, and give 0 values to false labels and 1 to true values.

In general, I've been using the following function to calculate confusion matrix, but in this case it didn't work since this function is for 1 dimensional arrays.

import numpy as np

def compute_confusion_matrix(labels, true, pred):

  K = len(labels) # Number of classes 
  result = np.zeros((K, K))

  for i in range(labels):
    result[true[i]][pred[i]]  = 1

  return result

If possible I'd like to modify this function suitable for my above case. At least I would like to understand how can I implement confusion matrix for results that in the form multi dimensional arrays.

CodePudding user response:

A possibility could be reversing the encoding to the format required by compute_confusion_matrix and, in this way, it is still possible to use your function!

To convert the predictions it's possible to do:

pred = list(np.where(result.label_ids == 1.)[1])

where np.where(result.label_ids == 1.)[1] is a numpy 1-dimensional array containing the indexes of the 1.s in each row of result.label_ids.

So pred will look like this according to your result.label_ids:

[3, 0, 3, ..., 0, 0, 3]

so it should have the same format of the original true (if also true is one-hot encoded the same strategy could be used to convert it) and can be used as input of your function for computing the confusion matrix.

CodePudding user response:

First of all I would like to thank Nicola Fanelli for the idea.

The function I gave above as well as the sklearn.metrics.confusion_matrix() both need to be provided a list of predicted and true values. After my prediction step, I try to retrieve my true and predicted values in order to calculate a confusion matrix. The results I was getting are in the following form

array([[0., 0., 0., 1.],
       [1., 0., 0., 0.],
       [0., 0., 0., 1.],
       ...,
       [1., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 0., 0., 1.]], dtype=float32)

Here the idea is to retrieve the positional index of the value 1. When I tried the approach suggested by Nicola Fanelli , the resulting sizes were lower then the initial ones and they weren't matching. Therefore, confusion matrix cannot be calculated. To be honest I couldn't find the reason behind it, but I'll investigate that more later.

So, I use a different technique to implement the same idea. I used np.argmax() and append these positions to a new list. Here is the code sample for true values

true = []
for i in range(len(result.label_ids)):
    n = np.array(result.label_ids[i])
    true.append(np.argmax(n))

This way I got the results in the desired format without my sizes are being changed.

Even though this is a working solution for my problem, I am still open to more elegant ways to approach this problem.

  • Related