I want to count how many different value between two columns in dataframe for confusion matrix-CodePudding

Here is a sample of my data:

import pandas as pd
data = {'tweet': ['saya suka makanan ini sangat enak', 'rasa kuahnya kurang enak, terlalu asin', 'favorit saya nih, ayam gorengnya enak banget', 'nasi bakar di toko ini enak banget!'],
        'actual_class': ["Positive", "Negative", "Positive", "Positive"], 'predicted_class': ["Positive", "Positive", "Negative", "Positive"]} 
df = pd.DataFrame(data)

I want to count the values of True Positive, False Positive, True Negative, and False Negative between the actual_class and predicted_class columns in my dataframe without using scikit-learn. I tried to code it but I can't find the efficient way.

CodePudding user response：

You can use the value counts function from pandas:

df['required column'].value_counts()

CodePudding user response：

If you cannot use scikit-learn, but can use pandas, you might like pandas.crosstab:

import pandas as pd
data = {'actual_class': ["Positive", "Negative", "Positive", "Positive"], 'predicted_class': ["Positive", "Positive", "Negative", "Positive"]}
df = pd.DataFrame(data)

print(pd.crosstab(df.actual_class, df.predicted_class))

i.e.: you get the same solution you would with import sklearn; print(confusion_matrix(df.actual_class, df.predicted_class)):

predicted_class  Negative  Positive
actual_class                       
Negative                0         1
Positive                1         2