Home > Software engineering >  How to calculate cumulative sum based on months in a pandas dataframe?
How to calculate cumulative sum based on months in a pandas dataframe?

Time:09-28

I want to calculate cumulative sum of values in a pandas dataframe column based on months.

code:

import pandas as pd
import numpy as np
data = {'month': ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'],
    'kpi': ['sales', 'sales quantity', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales quantity', 'sales', 'sales', 'sales', 'sales'],
    're_o' : [1, 1, 1, 11, 11, 11, 12, 12, 12, 13, 13, 13]
    }

# Create DataFrame
df = pd.DataFrame(data)
df['Q-Total'] = 0

df['Q-Total'] = np.where((df['month'] == 'April') | (df['month'] == 'May') | (df['month'] == 'June'),
                                df.groupby(['kpi'], sort=False)['re_o'].cumsum(), df['Q-Total'])
df['Q-Total'] = np.where((df['month'] == 'July') | (df['month'] == 'August') | (df['month'] == 'September'),
                                df.groupby(['kpi'], sort=False)['re_o'].cumsum(), df['Q-Total'])
df['Q-Total'] = np.where((df['month'] == 'October') | (df['month'] == 'November') | (df['month'] == 'December'),
                                df.groupby(['kpi'], sort=False)['re_o'].cumsum(), df['Q-Total'])
df['Q-Total'] = np.where((df['month'] == 'January') | (df['month'] == 'February') | (df['month'] == 'March'),
                                df.groupby(['kpi'], sort=False)['re_o'].cumsum(), df['Q-Total'])
print(df)

My required output is given below:

        month             kpi    re_o  Q-Total
0       April           sales     1        1
1         May  sales quantity     1        1
2        June           sales     1        2
3        July           sales    11       11
4      August           sales    11       22
5   September           sales    11       33
6     October           sales    12       12
7    November  sales quantity    12       12
8    December           sales    12       24
9     January           sales    13       13
10   February           sales    13       26
11      March           sales    13       39

But When I run this code,I got an output like below:

        month             kpi   re_o   Q-Total
0       April           sales     1        1
1         May  sales quantity     1        1
2        June           sales     1        2
3        July           sales    11       13
4      August           sales    11       24
5   September           sales    11       35
6     October           sales    12       47
7    November  sales quantity    12       13
8    December           sales    12       59
9     January           sales    13       72
10   February           sales    13       85
11      March           sales    13       98

I want to calculate cumulative sum in the below manner:

  1. If the months are April,May and June ,take the cumulative sum only from the April,May and June
  2. If the months are July,August and September ,take the cumulative sum only from the July,August and September
  3. If the months are October,November and December ,take the cumulative sum only from the October,November and December
  4. If the months are January,February and March ,take the cumulative sum only from the January,February and March

Can anyone suggest a solution?

CodePudding user response:

You can create quarters periods for groups and then use GroupBy.cumsum:

g = pd.to_datetime(df['month'], format='%B').dt.to_period('Q')

df['Q-Total'] = df.groupby([g,'kpi'])['re_o'].cumsum()
print (df)
        month             kpi  re_o  Q-Total
0       April           sales     1        1
1         May  sales quantity     1        1
2        June           sales     1        2
3        July           sales    11       11
4      August           sales    11       22
5   September           sales    11       33
6     October           sales    12       12
7    November  sales quantity    12       12
8    December           sales    12       24
9     January           sales    13       13
10   February           sales    13       26
11      March           sales    13       39

Details:

print (df.assign(q = g))
        month             kpi  re_o  Q-Total       q
0       April           sales     1        1  1900Q2
1         May  sales quantity     1        1  1900Q2
2        June           sales     1        2  1900Q2
3        July           sales    11       11  1900Q3
4      August           sales    11       22  1900Q3
5   September           sales    11       33  1900Q3
6     October           sales    12       12  1900Q4
7    November  sales quantity    12       12  1900Q4
8    December           sales    12       24  1900Q4
9     January           sales    13       13  1900Q1
10   February           sales    13       26  1900Q1
11      March           sales    13       39  1900Q1

CodePudding user response:

You can define custom groups from a list of lists:

groups = [['January', 'February', 'March'],
          ['April', 'May', 'June'],
          ['July', 'August', 'September'],
          ['October', 'November', 'December'],
         ]

# make mapper
d = {k:v for v,l in enumerate(groups) for k in l}

df['Q-Total'] = df.groupby([df['month'].map(d), 'kpi'])['re_o'].cumsum()

output:

        month             kpi  re_o  Q-Total
0       April           sales     1        1
1         May  sales quantity     1        1
2        June           sales     1        2
3        July           sales    11       11
4      August           sales    11       22
5   September           sales    11       33
6     October           sales    12       12
7    November  sales quantity    12       12
8    December           sales    12       24
9     January           sales    13       13
10   February           sales    13       26
11      March           sales    13       39
  • Related