Get index of rows after groupby and nlargest-CodePudding

I have a large dataframe where I want to use groupby and nlargest to look for the second largest, third, fourth and fifth largest value of each group. I have over 500 groups and each group has over 1000 values. I also have other columns in the dataframe which I want to keep after applying groupby and nlargest. My dataframe looks like this

df = pd.DataFrame({
    'group': [1,2,3,3,4, 5,6,7,7,8],
    'a': [4, 5, 3, 1, 2, 20, 10, 40, 50, 30],
    'b': [20, 10, 40, 50, 30, 4, 5, 3, 1, 2],
    'c': [25, 20, 5, 15, 10, 25, 20, 5, 15, 10]
})

To look for second, third, fourth largest and so on of each group for column a I use

secondlargest = df.groupby(['group'], as_index=False)['a'].apply(lambda grp: grp.nlargest(2).min())

which returns

   group   a
0      1   4
1      2   5
2      3   1
3      4   2
4      5  20
5      6  10
6      7  40
7      8  30

I need columns b and c present in this resulting dataframe. I use the following to subset the original dataframe but it returns an empty dataframe. How should I modify the code?

secondsubset = df[df.groupby(['group'])['a'].apply(lambda grp: grp.nlargest(2).min())]

CodePudding user response：

If I understand your goal correctly, you should be able to just drop as_index=False, use idxmin instead of min, pass the result to df.loc:

df.loc[df.groupby('group')['a'].apply(lambda grp: grp.nlargest(2).idxmin())]

CodePudding user response：

You can uses agg lambda. It is neater

df.groupby('group').agg(lambda grp: grp.nlargest(2).min())