I want to get the fraction of rows where two columns are equal, where I group by another column.
In the example below, I want to group by col1, and compare col2 == col3.
Input:
col1 | col2 | col3
A | c | c
A | d | g
B | c | c
B | d | d
Desired output:
A | 0.5 #its 50 % because c==c, d != g
B | 1.0 #its 100% because c==c, d==d
Is there a way to do this in pandas in one command?
CodePudding user response:
You can try group by col1
then sum the rows that col2
is equal with col3
and divide by the group length.
out = df.groupby('col1').apply(lambda g: g['col2'].eq(g['col3']).sum()/len(g))
print(out)
col1
A 0.5
B 1.0
dtype: float64
CodePudding user response:
groupby
mean
:
df['col2'].eq(df['col3']).groupby(df['col1']).mean()
col1
A 0.5
B 1.0
dtype: float64