Data (I've incorporated some extra steps because I receive the data in a particular form):
import numpy as np
import pandas as pd
d1 = pd.DataFrame({"Date" : ['1/1/2022', '12/15/2010', '6/1/2015', '1/31/2022', '12/31/2010', '3/10/2009', '1/7/2022', '12/9/2010','12/20/2010','1/13/2022'],
"Expense": ['Food', 'Food', 'Gasoline', 'Coffee', 'Coffee', 'PayPal', 'Gasoline', 'Gasoline','Gasoline','Coffee'],
"Total": [3.89, 7.00, 11, 0.99, 8.01, 99, 76, 50,48,9]})
#Change Date column to datetime
d1['Date'] = pd.to_datetime(d1['Date'])
#Create MMM-YY columm from Date column
d1['MMM-YY'] = d1['Date'].dt.strftime('%b') '-' d1['Date'].dt.strftime('%y')
#Sort DataFrame by Date
d1.sort_values('Date', inplace=True)
d1
Date Expense Total MMM-YY
5 2009-03-10 PayPal 99.00 Mar-09
7 2010-12-09 Gasoline 50.00 Dec-10
1 2010-12-15 Food 7.00 Dec-10
8 2010-12-20 Gasoline 48.00 Dec-10
4 2010-12-31 Coffee 8.01 Dec-10
2 2015-06-01 Gasoline 11.00 Jun-15
0 2022-01-01 Food 3.89 Jan-22
6 2022-01-07 Gasoline 76.00 Jan-22
9 2022-01-13 Coffee 9.00 Jan-22
3 2022-01-31 Coffee 0.99 Jan-22
I want to sum the Total column for every expense type within every month (entry in MMM-YY). Here's the important part: I want to keep the MMM-YY column in increasing order (just like d1 DataFrame), but I want the Expense column to be sorted alphabetically. Here is the desired output after applying groupby:
MMM-YY Expense
Mar-09 PayPal 99.00
Dec-10 Coffee 8.01
Food 7.00
Gasoline 98.00
Jun-15 Gasoline 11.00
Jan-22 Coffee 9.99
Food 3.89
Gasoline 76.00
Notice how the MMM-YY column remains in ascending order, but the expense column is organized alphabetically within each group with multiple rows.
Thank you!
CodePudding user response:
Couldn't manage to do it in only one groupby. But this is how I get to your desired result:
out = d1.groupby(['MMM-YY', 'Expense'], sort=False)['Total'].sum()
out.groupby(level=0, sort=False, group_keys=False).apply(lambda x: x.sort_index(level='Expense'))
MMM-YY Expense
Mar-09 PayPal 99.00
Dec-10 Coffee 8.01
Food 7.00
Gasoline 98.00
Jun-15 Gasoline 11.00
Jan-22 Coffee 9.99
Food 3.89
Gasoline 76.00
Name: Total, dtype: float64
CodePudding user response:
I would sort before aggregation, simplest IMO is to use a monthly period:
(d1.assign(m=d1['Date'].dt.to_period('M'))
.sort_values(by=['m', 'Expense'])
.groupby(['MMM-YY', 'Expense'], sort=False)['Total'].sum()
)
Output:
MMM-YY Expense
Mar-09 PayPal 99.00
Dec-10 Coffee 8.01
Food 7.00
Gasoline 98.00
Jun-15 Gasoline 11.00
Jan-22 Coffee 9.99
Food 3.89
Gasoline 76.00
Name: Total, dtype: float64