I have a code that is roughly like this:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Group':['a','a','b','b','b','c','c'], 'Label':[0,1,0,1,1,0,1], 'Num':[1,2,3,4,5,6,7]})
I would like to have a a new column that is the mean of Num, but only those with class label 1. However, this mean should be applied to all rows with label 1 only, with the rest being 0/NaN. The output should be like this: mean = [0,2,0,4.5,4.5,0,7]
Also how would it be if instead of 0/Nan, you just apply that mean to all values of that group? ie mean = [2,2,4.5,4.5,4.5,7,7]
Thanks a lot
CodePudding user response:
NaNs/0
If NaNs are OK, just slice before applying a groupby
mean
:
df['mean'] = df[df['Label'].eq(1)].groupby('Group')['Num'].transform('mean')
output:
Group Label Num mean
0 a 0 1 NaN
1 a 1 2 2.0
2 b 0 3 NaN
3 b 1 4 4.5
4 b 1 5 4.5
5 c 0 6 NaN
6 c 1 7 7.0
If you prefer 0, you can fillna(0)
output on all rows
To get the output on all rows.
mask
the values in Num when Label is 1 to change them into NaNs, groupby
the Group values and transform
all rows with the mean
of the group.
df['mean'] = (df['Num'].mask(df['Label'].ne(1))
.groupby(df['Group']).transform('mean'))
output:
Group Label Num mean
0 a 0 1 2.0
1 a 1 2 2.0
2 b 0 3 4.5
3 b 1 4 4.5
4 b 1 5 4.5
5 c 0 6 7.0
6 c 1 7 7.0