I need to do the following tasks.
I have 9 columns along with the original label. Each of those 9 columns consists of a probability value. Each 3 value is a prediction by a particular model. I have a total of 3 classifier models and there are 3 classes.
Now I have to apply the max rule.
For each class I have to pick the max probability this will give me three max values. Now I will finally return to the class which is maxed among those 3.
My code and sample
import numpy as np
df['Covid_max'] = np.where(df.columns == 'Covid',df.values,0).max(axis=1)
df['Normal_max'] = np.where(df.columns == 'Normal',df.values,0).max(axis=1)
df['Pneumonia_max'] = np.where(df.columns == 'Pneumonia',df.values,0).max(axis=1)
df['pred'] = df[['Covid_max','Normal_max','Pneumonia_max']].idxmax(axis=1)
new_label = {"pred": {"Covid_max": 0, "Normal_max": 1,"Pneumonia_max": 2,}}
df.replace(new_label , inplace = True)
Upto I have done it already. Now I got stuck. I only require the records where there is a mismatch between class and pred columns.(Here it should only print the 2nd row) How to do that?
Also, if anybody gives another solution, I would be happy to grasp that.
TIA
CodePudding user response:
Try this.
df_mismatch = df.loc[~(df['Class'] == df['pred'])]