How do you Group, Sort, and Extract 2nd Highest Quantity from Pandas DataFrame?-CodePudding

I'm looking to group a pandas dataframe according to customer, sort it according to quantity, and return the entire row containing the 2nd highest value for quantity.

I have something like this:

  customer  item  quantity 
0        A     p         5
1        A     p         8
2        A     q         9
3        A     q         2
4        B     p         3
5        B     p         6

I'm able to return a dataframe containing the rows with the maximum quantity. I used:

idx=df.groupby(by='customer')['quantity'].idxmax()
df_max=df.loc[idx,]

This returned the following Dataframe:

  customer  item  quantity
2        A     q         9
5        B     p         6

However, I also need a Dataframe with the second-highest quantity, and I'm stumped.

I'd like a Dataframe returned that looks like this:

  customer  item  quantity
1        A     p         8
4        B     p         3

Thanks!

CodePudding user response：

You can try pandas.core.groupby.GroupBy.nth to take the nth row from a sorted dataframe, note the index starts from 0.

out = (df.sort_values('quantity', ascending=False)
       .groupby('customer', as_index=False).nth(1))

print(out)

  customer item  quantity
1        A    p         8
4        B    p         3

CodePudding user response：

df.groupby("customer").agg({'quantity': list}).apply(lambda x: sorted(x.quantity)[-2], axis=1)