Group by 4 quarters and aggregate in python-CodePudding

I have a dataset, df, where I would like to group by 4 quarters and aggregate in python

Data

id  type    date    count
aa  hi     Q1 2022  4
aa  hi     Q2 2022  6
aa  hi     Q3 2022  7
aa  hi     Q4 2022  5
aa  ok     Q1 2022  1
aa  ok     Q2 2022  1
aa  ok     Q3 2022  1
aa  ok     Q4 2022  1
bb  hey    Q1 2023  5
bb  hey    Q2 2023  7
bb  hey    Q3 2023  9
bb  hey    Q4 2023  6

Desired

id  type    date    count
aa  hi     2022     22
aa  ok     2022     4
bb  hey    2023     27

Doing

df.groupby(['id','date', 'type']).sum()

However, I am trying to create the date as just the year and remove the quarters. Any suggestion is appreciated

CodePudding user response：

few ways - note using str methods means your series will be a string, cast it to an int if you need to do so.

using str.split

df.assign(
   date=df['date'].str.split(' ',expand=True)[1]
                       ).groupby(['id','type','date']).sum()


              count
id type date
aa hi   2022     22
   ok   2022      4
bb hey  2023     27

using str.extract if your years always appear as YYYY

df.assign(  
 date=df['date'].str.extract('(\d{4})')
       ).groupby(['id','type','date']).sum()

              count
id type date
aa hi   2022     22
   ok   2022      4
bb hey  2023     27