Home > Software design >  Pandas Keep top N values each group and set others to 0
Pandas Keep top N values each group and set others to 0

Time:07-29

I have the following df, which is adapted from here

import pandas as pd    
df = pd.DataFrame({'group':[1,1,1,2,2,2,3,3,3],
               'value':[1,2,3,1,2,3,4,3,2]})

I would like to have the following result:

df1 = pd.DataFrame({'group':[1,1,1,2,2,2,3,3,3],
               'value':[0,2,3,0,2,3,4,3,0]})

The logic is (1) to select top 2 numbers for each group and (2) to set other values to 0.

Any suggestions? Thanks.

CodePudding user response:

Using nlargest per group for boolean indexing:

m = df.groupby('group')['value'].apply(lambda g: g.isin(g.nlargest(2)))

df['value'] = df['value'].where(m, 0)

Output (as new column "value2" for clarity)

   group  value  value2
0      1      1       0
1      1      2       2
2      1      3       3
3      2      1       0
4      2      2       2
5      2      3       3
6      3      4       4
7      3      3       3
8      3      2       0

CodePudding user response:

Use nlargest

df.loc[~df.index.isin(df.groupby('group')['value']\
                        .apply(lambda s: s.nlargest(2))\
                        .index\
                        .get_level_values(1)), 'value'] = 0

   group  value
0      1      0
1      1      2
2      1      3
3      2      0
4      2      2
5      2      3
6      3      4
7      3      3
8      3      0

CodePudding user response:

You can use a combination of where() and cumcount() to get the results you are expecting

df = pd.DataFrame({'group':[1,1,1,2,2,2,3,3,3],
               'value':[1,2,3,1,2,3,4,3,2]})
df.where(df.sort_values('value', ascending = False).groupby('group')['value'].cumcount()   1 <= 2, 0)
  • Related