I'm looking to group a pandas dataframe according to customer, sort it according to quantity, and return the entire row containing the 2nd highest value for quantity.
I have something like this:
customer item quantity
0 A p 5
1 A p 8
2 A q 9
3 A q 2
4 B p 3
5 B p 6
I'm able to return a dataframe containing the rows with the maximum quantity. I used:
idx=df.groupby(by='customer')['quantity'].idxmax()
df_max=df.loc[idx,]
This returned the following Dataframe:
customer item quantity
2 A q 9
5 B p 6
However, I also need a Dataframe with the second-highest quantity, and I'm stumped.
I'd like a Dataframe returned that looks like this:
customer item quantity
1 A p 8
4 B p 3
Thanks!
CodePudding user response:
You can try pandas.core.groupby.GroupBy.nth
to take the nth row from a sorted dataframe, note the index starts from 0.
out = (df.sort_values('quantity', ascending=False)
.groupby('customer', as_index=False).nth(1))
print(out)
customer item quantity
1 A p 8
4 B p 3
CodePudding user response:
df.groupby("customer").agg({'quantity': list}).apply(lambda x: sorted(x.quantity)[-2], axis=1)