Home > Mobile >  pandas: calculate probability group by
pandas: calculate probability group by

Time:09-16

I'm unable to understand the output when calculating probability for a group by use-case. I'm interested to calculate probability, for example, in the below data frame, grouped by a1 probability of a2

import pandas as pd 
df = pd.DataFrame([[1,1,0],[0,1,1],[0,1,1],[1,1,0],[1,1,0],[1,0,0]],
                  columns=['a1','a2','a3'])

df[["a1","a2"]].groupby('a1').apply(lambda x: x[x>0].count()/len(x)) 

I get output as:

a1 a2

a1
0 0.0 1.00 1 1.0 0.75

The probability column should add up as 1. I cannot understand why for columns a2 the addition of total probability is 1.75. Second, how do I format the output from python in the tabular format as needed by stackoverflow.

Following link gives mean: https://stackoverflow.com/a/43015011/2740831 However, if IIUC probability is based upon the count of event occurance.

CodePudding user response:

In your ouput is 0.75, not 1.75 - solution should be simplify with mean by boolean DataFrame:

df1 = df["a2"].gt(0).groupby(df['a1']).mean().reset_index(name='prob')
print (df1)
   a1  prob
0   0  1.00
1   1  0.75


df2 = df[["a1","a2"]].gt(0).groupby(df['a1']).mean()
print (df2)
     a1    a2
a1           
0   0.0  1.00
1   1.0  0.75
  • Related