Pandas same previous year quarter difference-CodePudding

I am able to get the results I need but am curious to see if this can be done more efficiently (as in fewer steps) or by some built-in function.

As mockup data, I have 3 years of daily widget sales. The goal is to compare totals for previous-year same-quarter sales. So Q1 2019 sales compared to Q1 2020; Q1 2020 to Q1 2021 sales; Q2 2019 to Q2 2020 etc..

My strategy was:

Resample/downsample sales data into quarters
Create a conditional that creates a new column that names each quarter values like Q1, Q2, Q3 and Q4.
Create a column that only has the years.
create a new df by a groupby that results in a multiindex df that groups each quarter by years...so all Q1s are together and listed by years.
Now we do another groupby that finds the difference for like quarters year by year.

Here is the code with resampled/downsampled data:

sample_data = [{'date': Timestamp('2019-03-31 00:00:00'), '#_widgets': 4502},
 {'date': Timestamp('2019-06-30 00:00:00'), '#_widgets': 4388},
 {'date': Timestamp('2019-09-30 00:00:00'), '#_widgets': 4703},
 {'date': Timestamp('2019-12-31 00:00:00'), '#_widgets': 4344},
 {'date': Timestamp('2020-03-31 00:00:00'), '#_widgets': 4206},
 {'date': Timestamp('2020-06-30 00:00:00'), '#_widgets': 4281},
 {'date': Timestamp('2020-09-30 00:00:00'), '#_widgets': 4879},
 {'date': Timestamp('2020-12-31 00:00:00'), '#_widgets': 4115},
 {'date': Timestamp('2021-03-31 00:00:00'), '#_widgets': 4313},
 {'date': Timestamp('2021-06-30 00:00:00'), '#_widgets': 4578},
 {'date': Timestamp('2021-09-30 00:00:00'), '#_widgets': 4467},
 {'date': Timestamp('2021-12-31 00:00:00'), '#_widgets': 4871}]

Import libraries

import pandas as pd
from datetime import datetime 
import numpy as np

Create conditions to add Q1, Q2, etc using the idea that Q1 will have the month 3, Q2 month 6 etc.

abbrv = dfq["date"].dt.month
conditions = {
    "Q1": abbrv == 3,
    "Q2": abbrv == 6,
     "Q3": abbrv == 9,
    "Q4": abbrv == 12 
}

Then i passed the conditions:

## use numpy to run conditions 
dfq['quarter'] = np.select(conditions.values(), conditions.keys(), default="Not Available")

Add a year column so i can grouby:

## add years to it
dfq["year"] = dfq["date"].dt.year

Create new df using groupby:

## group by to get sort by quarters in each year
dff = dfq.groupby(["quarter", "year", ])["#_widgets"].sum().to_frame()

In my step final, I groupby again but get the difference for same-period quarters:

## find the difference for previous year quarter comparisons
dff['diff'] = dff.groupby(['quarter']).diff()

This works fine but it seems like there has to be a way with fewer steps without adding too much complexity.

CodePudding user response：

Your approach looks fine to me having said that you can certainly make it more concise by using the pandas builtin method called to_period to convert the datetime directly to quarter.

d = df['date'].dt.to_period('Q').dt
df['year'], df['quarter'] = d.year, d.quarter

dff = df.groupby(['quarter', 'year'], as_index=False)['#_widgets'].sum()
dff['diff'] = dff.groupby('quarter')['#_widgets'].diff()

Result

    quarter  year  #_widgets   diff
0         1  2019       4502    NaN
1         1  2020       4206 -296.0
2         1  2021       4313  107.0
3         2  2019       4388    NaN
4         2  2020       4281 -107.0
5         2  2021       4578  297.0
6         3  2019       4703    NaN
7         3  2020       4879  176.0
8         3  2021       4467 -412.0
9         4  2019       4344    NaN
10        4  2020       4115 -229.0
11        4  2021       4871  756.0

Here is the alternative approach with different output representation in case you are interested in trying :-)

d = df['date'].dt.to_period('Q').dt
df['year'], df['quarter'] = d.year, d.quarter

s = df.pivot_table('#_widgets', 'quarter', 'year', aggfunc='sum')
pd.concat([s, s.diff(axis=1)], axis=1, keys=['#_widgets', 'diff'])

Result

             #_widgets        diff          
year         2019  2020  2021 2019 2020 2021
quarter                                     
1            4502  4206  4313  NaN -296  107
2            4388  4281  4578  NaN -107  297
3            4703  4879  4467  NaN  176 -412
4            4344  4115  4871  NaN -229  756