Home > Software design >  Pandas: replace certain values within groups using group maximus
Pandas: replace certain values within groups using group maximus

Time:02-01

Here's my table:

category number probability
1102 24 0.3
1102 18 0.6
1102 16 0.1
2884 24 0.16
2884 15 0.8
2884 10 0.04

so I want to replace the number column that has probability lower than 15% with the number that has the highest probability within groups:

category number probability
1102 24 0.3
1102 18 0.6
1102 18 0.1
2884 24 0.16
2884 15 0.8
2884 15 0.04

CodePudding user response:

Find the number corresponding to max prob in a group then use loc to update values

n = df.sort_values('probability').groupby('category')['number'].transform('last')
df.loc[df['probability'] <= 0.15, 'number'] = n

   category  number  probability
0      1102      24         0.30
1      1102      18         0.60
2      1102      18         0.10
3      2884      24         0.16
4      2884      15         0.80
5      2884      15         0.04

CodePudding user response:

Use drop_duplicates to get the number with highest probabilities, then replace with np.where:

highest_prob = df.sort_values('probability').drop_duplicates('category', keep='last').set_index('category')['number')

df['number'] = np.where(df['probability'] < 0.15, df['category'].map(highest_prob), df['number'])

CodePudding user response:

A possible solution using idxmax and numpy.where :

ser = df.groupby("category")["number"].transform("idxmax")
​
df["number"] = np.where(df["probability"].lt(0.15), ser , df["number"])
​

Output :

print(df)
   category  number  probability
0      1102      24         0.30
1      1102      18         0.60
2      1002       2         0.10
3      2884      24         0.16
4      2884      15         0.80
5      2884       3         0.04
  • Related