I have the following df, which is adapted from here
import pandas as pd
df = pd.DataFrame({'group':[1,1,1,2,2,2,3,3,3],
'value':[1,2,3,1,2,3,4,3,2]})
I would like to have the following result:
df1 = pd.DataFrame({'group':[1,1,1,2,2,2,3,3,3],
'value':[0,2,3,0,2,3,4,3,0]})
The logic is (1) to select top 2 numbers for each group and (2) to set other values to 0.
Any suggestions? Thanks.
CodePudding user response:
Using nlargest
per group for boolean indexing:
m = df.groupby('group')['value'].apply(lambda g: g.isin(g.nlargest(2)))
df['value'] = df['value'].where(m, 0)
Output (as new column "value2" for clarity)
group value value2
0 1 1 0
1 1 2 2
2 1 3 3
3 2 1 0
4 2 2 2
5 2 3 3
6 3 4 4
7 3 3 3
8 3 2 0
CodePudding user response:
Use nlargest
df.loc[~df.index.isin(df.groupby('group')['value']\
.apply(lambda s: s.nlargest(2))\
.index\
.get_level_values(1)), 'value'] = 0
group value
0 1 0
1 1 2
2 1 3
3 2 0
4 2 2
5 2 3
6 3 4
7 3 3
8 3 0
CodePudding user response:
You can use a combination of where()
and cumcount()
to get the results you are expecting
df = pd.DataFrame({'group':[1,1,1,2,2,2,3,3,3],
'value':[1,2,3,1,2,3,4,3,2]})
df.where(df.sort_values('value', ascending = False).groupby('group')['value'].cumcount() 1 <= 2, 0)