Home > Blockchain >  Pandas sort and limit groups after group by
Pandas sort and limit groups after group by

Time:04-03

I want to create dataframe showing sales for a given year. The end goal is to show three products with the highest sales in the entire year, but with figures broken down by quarter.

I created dataframe result based on another dataframe df containing relevant data

result = df.groupby(['Product', 'Order Date']).sum()

                       Quantity Ordered
Product     Order Date  
 Product 1      2019Q1              787
                2019Q2             1067
                2019Q3              865
                2019Q4             1418
       ...         ...              ...
Product 90      2019Q1             1356
                2019Q2             1884
                2019Q3             1338
                2019Q4             2287

Now I want to sort it so that products with the highest total 'Quantity Ordered' are on top, and then take only 3 top products.

If it's possible to sort it before groupby that would also work, as long as the end goal is reached and the resulting dataframe looks the same as the one shown above but sorted and limited to the top 3 products.

I'm asking because I couldn't find any solution to this specific case or anything similar enough to be applicable.

CodePudding user response:

IIUC, you need a second groupby after sorting the values:

(df
# group (as columns) and sum
.groupby(['Product', 'Order Date'], as_index=False).sum()
# now sort the values
.sort_values(by='Quantity Ordered')
# keep the top 3 per date
.groupby('Order Date').head(3)
)

NB. untested as no reproducible input was provided

CodePudding user response:

I ended up determining top three products before groupby and simply filtered out irrelevant products before grouping.

totalByProduct = (frame).groupby(['Product']).sum()
totalByProduct.sort_values(by=['Quantity Ordered'], ascending=False, inplace=True)
topProductsList = list(totalByProduct.head(3).index)

result = df[df['Product'].isin(topProductsList)].groupby(['Product', 'Order Date']).sum()
  • Related