Suppose the following dataframe:
df = pd.DataFrame(
{'X': ['a', 'a', 'b', 'a', 'b'],
'Y': [2, 4, 8, 10, 5]})
which looks as:
X Y
0 a 2
1 a 4
2 b 8
3 a 10
4 b 5
How to replace the first element of each group by X
with the respective mean?
The expected output:
X Y
0 a 5.33
1 a 4.00
2 b 6.50
3 a 10.00
4 b 5.00
Sorry if this is a too basic question, but I am a newbie to Python (beginning its learning).
CodePudding user response:
Use GroupBy.transform
for averages and set only first value per group in numpy.where
with mask by Series.duplicated
:
df['Y'] = np.where(df.X.duplicated(),df.Y,df.groupby("X")['Y'].transform('mean'))
print (df)
X Y
0 a 5.333333
1 a 4.000000
2 b 6.500000
3 a 10.000000
4 b 5.000000
Another solution with DataFrame.loc
:
df.loc[~df.X.duplicated(), 'Y'] = df.groupby("X")['Y'].transform('mean')
CodePudding user response:
You can do:
g = df.groupby('X', as_index=False)
df.iloc[g.head(1).index] = g.mean()
Basically get the indexes of first rows of each group and replace them with mean values.
print(df):
X Y
0 a 5.333333
1 a 4.000000
2 b 6.500000
3 a 10.000000
4 b 5.000000