I have a dataframe like this, with one column being the label and the other columns being predictions
label pred1 pred2 pred3
0 Apple Apple Orange Apple
1 Orange Orange Orange Orange
I would like to extend this dataframe with the true positive rate (TP/TP FN) for each row. This column should look like this:
Score
0 0.66
1 1.00
I am unsure on how to go on about this. Are there pandas functions that would help with this task?
Executable code: https://www.online-python.com/WP7wbgcqMS
CodePudding user response:
Here is one approach where we convert the data to long format and check if the label equals the prediction. The average of the True/False values will be your Score.
import pandas as pd
d = {'Label': ['Apple','Orange'], 'pred1': ['Apple','Orange'], 'pred2': ['Orange','Orange'], 'pred3': ['Apple','Orange']}
df = pd.DataFrame(data=d)
df = df.melt(id_vars='Label', value_name='pred')
df['match'] = df['Label'].eq(df['pred'])
df.groupby('Label')['match'].mean().reset_index(name='Score')
Output
Label Score
0 Apple 0.666667
1 Orange 1.000000
CodePudding user response:
maybe like this
temp = df.T.apply(lambda x: x[0]==x).astype(int)
(temp.sum()-1)/(temp.count()-1)
Out:
0 0.666667
1 1.000000