Sort Pandas Groupby by a particular column-CodePudding

Data (I've incorporated some extra steps because I receive the data in a particular form):

    import numpy as np
    import pandas as pd

    d1 = pd.DataFrame({"Date" : ['1/1/2022', '12/15/2010', '6/1/2015', '1/31/2022', '12/31/2010', '3/10/2009', '1/7/2022', '12/9/2010','12/20/2010','1/13/2022'],
               "Expense": ['Food', 'Food', 'Gasoline', 'Coffee', 'Coffee', 'PayPal', 'Gasoline', 'Gasoline','Gasoline','Coffee'],
               "Total": [3.89, 7.00, 11, 0.99, 8.01, 99, 76, 50,48,9]})

    #Change Date column to datetime
    d1['Date'] = pd.to_datetime(d1['Date'])

    #Create MMM-YY columm from Date column
    d1['MMM-YY'] = d1['Date'].dt.strftime('%b')   '-'   d1['Date'].dt.strftime('%y')

    #Sort DataFrame by Date
    d1.sort_values('Date', inplace=True)

    d1

        Date        Expense   Total  MMM-YY
    5   2009-03-10  PayPal    99.00  Mar-09
    7   2010-12-09  Gasoline  50.00  Dec-10
    1   2010-12-15  Food      7.00   Dec-10
    8   2010-12-20  Gasoline  48.00  Dec-10
    4   2010-12-31  Coffee    8.01   Dec-10
    2   2015-06-01  Gasoline  11.00  Jun-15
    0   2022-01-01  Food      3.89   Jan-22
    6   2022-01-07  Gasoline  76.00  Jan-22
    9   2022-01-13  Coffee    9.00   Jan-22
    3   2022-01-31  Coffee    0.99   Jan-22

I want to sum the Total column for every expense type within every month (entry in MMM-YY). Here's the important part: I want to keep the MMM-YY column in increasing order (just like d1 DataFrame), but I want the Expense column to be sorted alphabetically. Here is the desired output after applying groupby:

    MMM-YY  Expense 
    Mar-09  PayPal      99.00
    Dec-10  Coffee       8.01
            Food         7.00
            Gasoline    98.00
    Jun-15  Gasoline    11.00
    Jan-22  Coffee       9.99
            Food         3.89
            Gasoline    76.00

Notice how the MMM-YY column remains in ascending order, but the expense column is organized alphabetically within each group with multiple rows.

Thank you!

CodePudding user response：

Couldn't manage to do it in only one groupby. But this is how I get to your desired result:

out = d1.groupby(['MMM-YY', 'Expense'], sort=False)['Total'].sum()
out.groupby(level=0, sort=False, group_keys=False).apply(lambda x: x.sort_index(level='Expense'))

MMM-YY  Expense 
Mar-09  PayPal      99.00
Dec-10  Coffee       8.01
        Food         7.00
        Gasoline    98.00
Jun-15  Gasoline    11.00
Jan-22  Coffee       9.99
        Food         3.89
        Gasoline    76.00
Name: Total, dtype: float64

CodePudding user response：

I would sort before aggregation, simplest IMO is to use a monthly period:

(d1.assign(m=d1['Date'].dt.to_period('M'))
   .sort_values(by=['m', 'Expense'])
   .groupby(['MMM-YY', 'Expense'], sort=False)['Total'].sum()
 )

Output:

MMM-YY  Expense 
Mar-09  PayPal      99.00
Dec-10  Coffee       8.01
        Food         7.00
        Gasoline    98.00
Jun-15  Gasoline    11.00
Jan-22  Coffee       9.99
        Food         3.89
        Gasoline    76.00
Name: Total, dtype: float64