Home > Enterprise >  sum every nth row create new column and apply the same value every 3 rows on that column
sum every nth row create new column and apply the same value every 3 rows on that column

Time:11-04

I have a dataframe as such

Date              Search Volume   
Jan 2004            80,000
Feb 2004            90,000
Mar 2004           100,000
Apr 2004            40,000
May 2004            60,000
Jun 2004            50,000

I wish to have an output like this:

Date              Search Volume   Total          Quarter

Jan 2004            80,000        270,000        2004Q1
Feb 2004            90,000        270,000        2004Q1
Mar 2004           100,000        270,000        2004Q1
Apr 2004            40,000        150,000        2004Q2
May 2004            60,000        150,000        2004Q2
Jun 2004            50,000        150,000        2004Q2
...
...
Aug 2022            50,000       100,000         2022Q3
Sep 2022            10,000       100,000         2022Q3
Oct 2022            40,000       100,000         2022Q3

So what I'm trying to do is sum every 3 rows (quarter) and create a new column called total, and apply the sum to every row that belongs to that quarter. The other column should be Quarter, which represents the quarter that the month belongs to.

I have tried this:

N = 3
keyvolume=keyvol.groupby(keyvol.index // 3).sum()

but this just results in a sum, not sure how to apply the values every 3 rows that quarter, and I don't know how to generate the quarter column.

Appreciate your help.

CodePudding user response:

First convert column Search Volume to numeric by Series.str.replace and casting to integers or floats, then convert dates to quarters by to_datetime and Series.dt.to_period and for new column use GroupBy.transform with sum per quarters:

def func(df):
    df['Search Volume'] = df['Search Volume'].str.replace(',','', regex=True).astype(int)
    q = pd.to_datetime(df['Date']).dt.to_period('q')

    df['Total'] = df['Search Volume'].groupby(q).transform('sum')
    df['Quarter'] = q

    return df

out = func(df)
print (out)

       Date  Search Volume   Total Quarter
0  Jan 2004          80000  270000  2004Q1
1  Feb 2004          90000  270000  2004Q1
2  Mar 2004         100000  270000  2004Q1
3  Apr 2004          40000  150000  2004Q2
4  May 2004          60000  150000  2004Q2
5  Jun 2004          50000  150000  2004Q2
  • Related