Task is to generate a column avg
where avg
= 1st value of 'pts' group by 'id'
/ max value of 'x' group by 'id'
. NB: If max of x=0, then division operation will be ignored and avg
remains NaN
.
Sample df:
id pts x
0 1 75 0
1 1 75 1
2 1 75 2
3 2 48 0
4 2 48 0
5 3 53 0
6 3 53 2
7 3 53 3
8 3 53 3
9 4 6 1
Expected df:
id pts x avg
0 1 75 0 37.50
1 1 75 1 37.50
2 1 75 2 37.50
3 2 48 0 NaN
4 2 48 0 NaN
5 3 53 0 17.66
6 3 53 2 17.66
7 3 53 3 17.66
8 3 53 3 17.66
9 4 6 1 6.00
I tried:
df['avg'] = df['id'].map(df.groupby('id').apply(lambda x: x['pts'].iloc[0] / x['x'].max()))
CodePudding user response:
Just replace 0
s with NaN
before you divide.
Change
x['x'].max()
to
x['x'].replace(0, np.nan).max()
Like this:
df['avg'] = df['id'].map(df.groupby('id').apply(lambda x: x['pts'].iloc[0] / x['x'].replace(0, np.nan).max()))
Output:
>>> df
id pts x avg
0 1 75 0 37.500000
1 1 75 1 37.500000
2 1 75 2 37.500000
3 2 48 0 NaN
4 2 48 0 NaN
5 3 53 0 17.666667
6 3 53 2 17.666667
7 3 53 3 17.666667
8 3 53 3 17.666667
9 4 6 1 6.000000
CodePudding user response:
We can use a lazy groupby with groupby transform
'first'
for pts
and 'max'
for x
then divide and replace
inf
with nan
:
# Save to reuse grouper.
g = df.groupby('id')
# first 'pts' value divided by max 'x' value
df['avg'] = (
g['pts'].transform('first') / g['x'].transform('max')
).replace(np.inf, np.nan) # replace inf with nan
df
:
id pts x avg
0 1 75 0 37.500000
1 1 75 1 37.500000
2 1 75 2 37.500000
3 2 48 0 NaN
4 2 48 0 NaN
5 3 53 0 17.666667
6 3 53 2 17.666667
7 3 53 3 17.666667
8 3 53 3 17.666667
9 4 6 1 6.000000
Setup:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'id': [1, 1, 1, 2, 2, 3, 3, 3, 3, 4],
'pts': [75, 75, 75, 48, 48, 53, 53, 53, 53, 6],
'x': [0, 1, 2, 0, 0, 0, 2, 3, 3, 1]
})
CodePudding user response:
To handle the issue explicitly:
import numpy as np
def get_avg(x):
avg = x['pts'].iloc[0] / x['x'].max()
if np.isinf(avg):
return np.nan
else:
return avg
df['id'].map(df.groupby("id").apply(get_avg))
CodePudding user response:
You can update your previous solution to below.
df['avg'] = df['id'].map(df.groupby('id').apply(lambda x: x['pts'].iloc[0] / x['x'].max() if x['x'].max()!=0 else None))