Pandas group by and fraction where two columns equal-CodePudding

I want to get the fraction of rows where two columns are equal, where I group by another column.

In the example below, I want to group by col1, and compare col2 == col3.

Input:

col1 | col2 | col3
A    |  c   |  c
A    |  d   |  g
B    |  c   |  c
B    |  d   |  d

Desired output:

A | 0.5 #its 50 % because c==c, d != g
B | 1.0 #its 100% because c==c, d==d

Is there a way to do this in pandas in one command?

CodePudding user response：

You can try group by col1 then sum the rows that col2 is equal with col3 and divide by the group length.

out = df.groupby('col1').apply(lambda g: g['col2'].eq(g['col3']).sum()/len(g))

print(out)

col1
A    0.5
B    1.0
dtype: float64

CodePudding user response：

`groupby` `mean`:

df['col2'].eq(df['col3']).groupby(df['col1']).mean()

col1
A    0.5
B    1.0
dtype: float64