I've calculated the conditional probability using the code below. Now I would like to add the result of this calculation as a new column to my dataframe. Would that be possible with this code?
df.groupby(['mode','income_level'])['service'].value_counts() / df.groupby(['mode','income_level'])['service'].count()
CodePudding user response:
Use DataFrame.join
if need new column from your solution:
df = pd.DataFrame({'mode':list('aaaabbbb'),
'income_level':[5,5,5,0,5,0,0,0],
'service':[1,0] * 4})
a = (df.groupby(['mode','income_level'])['service'].value_counts() /
df.groupby(['mode','income_level'])['service'].count())
df = df.join(a.rename('new1'), on=['mode','income_level', 'service'])
Or use GroupBy.transform
, instead value_counts
add column to groupby
and use GroupBy.size
:
s1 = df.groupby(['mode','income_level', 'service'])['service'].transform('size')
s2 = df.groupby(['mode','income_level'])['service'].transform('count')
df['new'] = s1 / s2
print (df)
mode income_level service new1 new
0 a 5 1 0.666667 0.666667
1 a 5 0 0.333333 0.333333
2 a 5 1 0.666667 0.666667
3 a 0 0 1.000000 1.000000
4 b 5 1 1.000000 1.000000
5 b 0 0 0.666667 0.666667
6 b 0 1 0.333333 0.333333
7 b 0 0 0.666667 0.666667