I have a dataset, df, where I would like to group by 4 quarters and aggregate in python
Data
id type date count
aa hi Q1 2022 4
aa hi Q2 2022 6
aa hi Q3 2022 7
aa hi Q4 2022 5
aa ok Q1 2022 1
aa ok Q2 2022 1
aa ok Q3 2022 1
aa ok Q4 2022 1
bb hey Q1 2023 5
bb hey Q2 2023 7
bb hey Q3 2023 9
bb hey Q4 2023 6
Desired
id type date count
aa hi 2022 22
aa ok 2022 4
bb hey 2023 27
Doing
df.groupby(['id','date', 'type']).sum()
However, I am trying to create the date as just the year and remove the quarters. Any suggestion is appreciated
CodePudding user response:
few ways - note using str
methods means your series
will be a string, cast it to an int if you need to do so.
using str.split
df.assign(
date=df['date'].str.split(' ',expand=True)[1]
).groupby(['id','type','date']).sum()
count
id type date
aa hi 2022 22
ok 2022 4
bb hey 2023 27
using str.extract
if your years always appear as YYYY
df.assign(
date=df['date'].str.extract('(\d{4})')
).groupby(['id','type','date']).sum()
count
id type date
aa hi 2022 22
ok 2022 4
bb hey 2023 27