I need a function that groups date column into n-day periods with respect to some start and end dates (1 year interval). To assign a quarter (~90 day period) to every date in the data frame I used the code below, which is not very neat (and I want to reuse it for 30-day period as well)
def get_quarter(row, start_date, col_name):
# date = row['TRN_DT']
date = row[col_name]
if date >= start_date and date <= start_date timedelta(days = 90):
return 0
if date > start_date timedelta(days = 90) and date <= start_date timedelta(180):
return 1
if date > start_date timedelta(180) and date <= start_date timedelta(270):
return 2
return 3
It basically checks row by row which interval current date belongs to. I was wondering whether there is a better way to do this. pandas.Series.dt.to_period() will not do since it uses a calendar year as a reference --start 01.Jan, end 31.Dec; that is, 16.Jan.XXXX will always be in Q1; what I want is for 16.Jan to be in Q3 if the start date is 16-Jun. Thanks
CodePudding user response:
FTR, a possible solution is to shift every date in the series according to the start_date
, to simulate that start_date
is the beginning of the year:
>>> start_date = pd.to_datetime("2021-06-16")
>>> dates_series = pd.Series([pd.to_datetime("2020-01-16"), pd.to_datetime("2020-04-16")], name="dates")
0 1
1 2
Name: dates, dtype: int64
We calculate the difference between the current date and the beginning of the same year.
>>> offset = start_date - start_date.replace(month=1, day=1)
>>> offset
166 days 00:00:00
We move all of our dates by the same offser to calculate the "new quarter"
>>> (dates - offset).dt.quarter
0 3
1 4
Name: dates, dtype: int64