I have a dataframe like as shown below
app_date
20/3/2017
28/8/2017
18/10/2017
15/2/2017
2/5/2017
11/9/2016
df = pd.read_clipboard()
Our company fiscal year is from October
of current year to September
of next year
Q1 - Oct to Dec
Q2 - Jan to Mar
Q3 - Apr to Jun
Q4 - July - Sep
I was trying something like below
tf['app_date'] = pd.to_datetime(tf['app_date'])
tf['act_month'] = pd.DatetimeIndex(tf['app_date']).month
tf['act_year'] = pd.DatetimeIndex(tf['app_date']).year
tf['act_qtr'] = tf['app_date'].dt.to_period('Q').dt.strftime('Q%q')
tf['comp_fis_year'] = np.where(tf['act_month'] >= 9,tf['act_year'] 1,tf['act_year'])
tf['comp_fis_qtr'] = tf['app_date'].dt.to_period('Q').add(1).dt.strftime('Q%q') #thanks to jezrael for this trick to get quarter
Is there any elegant and efficient way to do the above? Mainly for calculating the fiscal year based on our financial year (Oct to Sep)
?
I expect my output to be like as shown below
CodePudding user response:
IIUC, use to_period
with a custom frequency:
pd.to_datetime(df['app_date'], dayfirst=True).dt.to_period('Q-SEP')
output:
0 2017Q2
1 2017Q4
2 2018Q1
3 2017Q2
4 2017Q3
5 2016Q4
Name: app_date, dtype: period[Q-SEP]
for separate columns:
s = pd.to_datetime(df['app_date'], dayfirst=True).dt.to_period('Q-SEP')
s.astype(str).str.extract('(?P<year>\d )(?P<quarter>Q\d )')
output:
year quarter
0 2017 Q2
1 2017 Q4
2 2018 Q1
3 2017 Q2
4 2017 Q3
5 2016 Q4
Q start/end:
df['Q'] = pd.to_datetime(df['app_date'], dayfirst=True).dt.to_period('Q-SEP')
df['Qstart'] = df['Q'].dt.asfreq('D', 's')
df['Qend'] = df['Q'].dt.asfreq('D', 'e')
output:
app_date Q Qstart Qend
0 20/3/2017 2017Q2 2017-01-01 2017-03-31
1 28/8/2017 2017Q4 2017-07-01 2017-09-30
2 18/10/2017 2018Q1 2017-10-01 2017-12-31
3 15/2/2017 2017Q2 2017-01-01 2017-03-31
4 2/5/2017 2017Q3 2017-04-01 2017-06-30
5 11/9/2016 2016Q4 2016-07-01 2016-09-30
CodePudding user response:
You can use:
# Create 2 DatetimeIndex instead of a Series (avoid using .dt accessor)
start = pd.to_datetime(df['app_date'].values, dayfirst=False)
end = start pd.DateOffset(months=3)
cols = ['act_month', 'act_year', 'act_qtr', 'comp_fis_year', 'comp_fis_qtr']
df = df.join(pd.DataFrame([start.month, start.year, 'Q' start.quarter.astype(str),
end.year, 'Q' end.quarter.astype(str)],
index=cols).T)
Output:
>>> df
app_date act_month act_year act_qtr comp_fis_year comp_fis_qtr
0 3/20/2017 3 2017 Q1 2017 Q2
1 8/28/2017 8 2017 Q3 2017 Q4
2 10/18/2017 10 2017 Q4 2018 Q1
3 2/15/2017 2 2017 Q1 2017 Q2
4 2/5/2017 2 2017 Q1 2017 Q2
5 11/9/2016 11 2016 Q4 2017 Q1
Setup:
app_date
3/20/2017
8/28/2017
10/18/2017
2/15/2017
2/5/2017
11/9/2016
df = pd.read_clipboard()