Home > database >  Sort Pandas Groupby by a particular column
Sort Pandas Groupby by a particular column

Time:11-05

Data (I've incorporated some extra steps because I receive the data in a particular form):

    import numpy as np
    import pandas as pd

    d1 = pd.DataFrame({"Date" : ['1/1/2022', '12/15/2010', '6/1/2015', '1/31/2022', '12/31/2010', '3/10/2009', '1/7/2022', '12/9/2010','12/20/2010','1/13/2022'],
               "Expense": ['Food', 'Food', 'Gasoline', 'Coffee', 'Coffee', 'PayPal', 'Gasoline', 'Gasoline','Gasoline','Coffee'],
               "Total": [3.89, 7.00, 11, 0.99, 8.01, 99, 76, 50,48,9]})

    #Change Date column to datetime
    d1['Date'] = pd.to_datetime(d1['Date'])

    #Create MMM-YY columm from Date column
    d1['MMM-YY'] = d1['Date'].dt.strftime('%b')   '-'   d1['Date'].dt.strftime('%y')

    #Sort DataFrame by Date
    d1.sort_values('Date', inplace=True)

    d1

        Date        Expense   Total  MMM-YY
    5   2009-03-10  PayPal    99.00  Mar-09
    7   2010-12-09  Gasoline  50.00  Dec-10
    1   2010-12-15  Food      7.00   Dec-10
    8   2010-12-20  Gasoline  48.00  Dec-10
    4   2010-12-31  Coffee    8.01   Dec-10
    2   2015-06-01  Gasoline  11.00  Jun-15
    0   2022-01-01  Food      3.89   Jan-22
    6   2022-01-07  Gasoline  76.00  Jan-22
    9   2022-01-13  Coffee    9.00   Jan-22
    3   2022-01-31  Coffee    0.99   Jan-22

I want to sum the Total column for every expense type within every month (entry in MMM-YY). Here's the important part: I want to keep the MMM-YY column in increasing order (just like d1 DataFrame), but I want the Expense column to be sorted alphabetically. Here is the desired output after applying groupby:

    MMM-YY  Expense 
    Mar-09  PayPal      99.00
    Dec-10  Coffee       8.01
            Food         7.00
            Gasoline    98.00
    Jun-15  Gasoline    11.00
    Jan-22  Coffee       9.99
            Food         3.89
            Gasoline    76.00

Notice how the MMM-YY column remains in ascending order, but the expense column is organized alphabetically within each group with multiple rows.

Thank you!

CodePudding user response:

Couldn't manage to do it in only one groupby. But this is how I get to your desired result:

out = d1.groupby(['MMM-YY', 'Expense'], sort=False)['Total'].sum()
out.groupby(level=0, sort=False, group_keys=False).apply(lambda x: x.sort_index(level='Expense'))
MMM-YY  Expense 
Mar-09  PayPal      99.00
Dec-10  Coffee       8.01
        Food         7.00
        Gasoline    98.00
Jun-15  Gasoline    11.00
Jan-22  Coffee       9.99
        Food         3.89
        Gasoline    76.00
Name: Total, dtype: float64

CodePudding user response:

I would sort before aggregation, simplest IMO is to use a monthly period:

(d1.assign(m=d1['Date'].dt.to_period('M'))
   .sort_values(by=['m', 'Expense'])
   .groupby(['MMM-YY', 'Expense'], sort=False)['Total'].sum()
 )

Output:

MMM-YY  Expense 
Mar-09  PayPal      99.00
Dec-10  Coffee       8.01
        Food         7.00
        Gasoline    98.00
Jun-15  Gasoline    11.00
Jan-22  Coffee       9.99
        Food         3.89
        Gasoline    76.00
Name: Total, dtype: float64
  • Related