I wish to group and to convert datetimes to quarters within my dataset, df.
Data
location start end
aa/bb 1/1/2022 2/1/2022
aa/bb 2/1/2022 3/1/2022
aa/bb 3/1/2022 4/1/2022
cc/dd 4/1/2022 5/1/2022
cc/dd 5/1/2022 6/1/2022
cc/dd 6/1/2022 7/1/2022
Desired
location start end
aa/bb Q1 22 Q1 22
cc/dd Q2 22 Q2 22
Doing
df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])
df['start'] = pd.PeriodIndex(df['start'], freq='Q').strftime('Q%q.%y')
df['end'] = pd.PeriodIndex(df['end'], freq='Q').strftime('Q%q.%y')
CodePudding user response:
Assuming all the dates of the same location belong to the same quarter
df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])
res = df.groupby('location').agg(lambda g: g.iloc[0].to_period('Q').strftime('Q%q.%y'))
A better way probably is
df['start'] = pd.PeriodIndex(df['start'], freq='Q').strftime('Q%q.%y')
df['end'] = pd.PeriodIndex(df['end'], freq='Q').strftime('Q%q.%y')
res = df.groupby('location').first()
Output
Input and output for both cases
>>> df
location start end
0 aa/bb 2022-01-01 2022-02-01
1 aa/bb 2022-02-01 2022-03-01
2 aa/bb 2022-03-01 2022-04-01
3 cc/dd 2022-04-01 2022-05-01
4 cc/dd 2022-05-01 2022-06-01
5 cc/dd 2022-06-01 2022-07-01
>>> res
start end
location
aa/bb Q1.22 Q1.22
cc/dd Q2.22 Q2.22
CodePudding user response:
Try, coerce to datetime, extract period and format using dt.strftime. Then drop duplicates
df =df.assign(start=pd.to_datetime(df['start']).dt.to_period('Q').dt.strftime('Q%q %y'), end= pd.to_datetime(df['end']).dt.to_period('Q').dt.strftime('Q%q %y')).drop_duplicates()
location start end
0 aa/bb Q1 22 Q1 22
2 aa/bb Q1 22 Q2 22
3 cc/dd Q2 22 Q2 22
5 cc/dd Q2 22 Q3 22