I have a dataframe as such
Date Search Volume
Jan 2004 80,000
Feb 2004 90,000
Mar 2004 100,000
Apr 2004 40,000
May 2004 60,000
Jun 2004 50,000
I wish to have an output like this:
Date Search Volume Total Quarter
Jan 2004 80,000 270,000 2004Q1
Feb 2004 90,000 270,000 2004Q1
Mar 2004 100,000 270,000 2004Q1
Apr 2004 40,000 150,000 2004Q2
May 2004 60,000 150,000 2004Q2
Jun 2004 50,000 150,000 2004Q2
...
...
Aug 2022 50,000 100,000 2022Q3
Sep 2022 10,000 100,000 2022Q3
Oct 2022 40,000 100,000 2022Q3
So what I'm trying to do is sum every 3 rows (quarter) and create a new column called total, and apply the sum to every row that belongs to that quarter. The other column should be Quarter, which represents the quarter that the month belongs to.
I have tried this:
N = 3
keyvolume=keyvol.groupby(keyvol.index // 3).sum()
but this just results in a sum, not sure how to apply the values every 3 rows that quarter, and I don't know how to generate the quarter column.
Appreciate your help.
CodePudding user response:
First convert column Search Volume
to numeric by Series.str.replace
and casting to integers or floats, then convert dates to quarters by to_datetime
and Series.dt.to_period
and for new column use GroupBy.transform
with sum
per quarters:
def func(df):
df['Search Volume'] = df['Search Volume'].str.replace(',','', regex=True).astype(int)
q = pd.to_datetime(df['Date']).dt.to_period('q')
df['Total'] = df['Search Volume'].groupby(q).transform('sum')
df['Quarter'] = q
return df
out = func(df)
print (out)
Date Search Volume Total Quarter
0 Jan 2004 80000 270000 2004Q1
1 Feb 2004 90000 270000 2004Q1
2 Mar 2004 100000 270000 2004Q1
3 Apr 2004 40000 150000 2004Q2
4 May 2004 60000 150000 2004Q2
5 Jun 2004 50000 150000 2004Q2