I want to calculate cumulative sum of values in a pandas dataframe column based on months.
code:
import pandas as pd
import numpy as np
data = {'month': ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'],
'kpi': ['sales', 'sales quantity', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales quantity', 'sales', 'sales', 'sales', 'sales'],
're_o' : [1, 1, 1, 11, 11, 11, 12, 12, 12, 13, 13, 13]
}
# Create DataFrame
df = pd.DataFrame(data)
df['Q-Total'] = 0
df['Q-Total'] = np.where((df['month'] == 'April') | (df['month'] == 'May') | (df['month'] == 'June'),
df.groupby(['kpi'], sort=False)['re_o'].cumsum(), df['Q-Total'])
df['Q-Total'] = np.where((df['month'] == 'July') | (df['month'] == 'August') | (df['month'] == 'September'),
df.groupby(['kpi'], sort=False)['re_o'].cumsum(), df['Q-Total'])
df['Q-Total'] = np.where((df['month'] == 'October') | (df['month'] == 'November') | (df['month'] == 'December'),
df.groupby(['kpi'], sort=False)['re_o'].cumsum(), df['Q-Total'])
df['Q-Total'] = np.where((df['month'] == 'January') | (df['month'] == 'February') | (df['month'] == 'March'),
df.groupby(['kpi'], sort=False)['re_o'].cumsum(), df['Q-Total'])
print(df)
My required output is given below:
month kpi re_o Q-Total
0 April sales 1 1
1 May sales quantity 1 1
2 June sales 1 2
3 July sales 11 11
4 August sales 11 22
5 September sales 11 33
6 October sales 12 12
7 November sales quantity 12 12
8 December sales 12 24
9 January sales 13 13
10 February sales 13 26
11 March sales 13 39
But When I run this code,I got an output like below:
month kpi re_o Q-Total
0 April sales 1 1
1 May sales quantity 1 1
2 June sales 1 2
3 July sales 11 13
4 August sales 11 24
5 September sales 11 35
6 October sales 12 47
7 November sales quantity 12 13
8 December sales 12 59
9 January sales 13 72
10 February sales 13 85
11 March sales 13 98
I want to calculate cumulative sum in the below manner:
- If the months are April,May and June ,take the cumulative sum only from the April,May and June
- If the months are July,August and September ,take the cumulative sum only from the July,August and September
- If the months are October,November and December ,take the cumulative sum only from the October,November and December
- If the months are January,February and March ,take the cumulative sum only from the January,February and March
Can anyone suggest a solution?
CodePudding user response:
You can create quarters periods for groups and then use GroupBy.cumsum
:
g = pd.to_datetime(df['month'], format='%B').dt.to_period('Q')
df['Q-Total'] = df.groupby([g,'kpi'])['re_o'].cumsum()
print (df)
month kpi re_o Q-Total
0 April sales 1 1
1 May sales quantity 1 1
2 June sales 1 2
3 July sales 11 11
4 August sales 11 22
5 September sales 11 33
6 October sales 12 12
7 November sales quantity 12 12
8 December sales 12 24
9 January sales 13 13
10 February sales 13 26
11 March sales 13 39
Details:
print (df.assign(q = g))
month kpi re_o Q-Total q
0 April sales 1 1 1900Q2
1 May sales quantity 1 1 1900Q2
2 June sales 1 2 1900Q2
3 July sales 11 11 1900Q3
4 August sales 11 22 1900Q3
5 September sales 11 33 1900Q3
6 October sales 12 12 1900Q4
7 November sales quantity 12 12 1900Q4
8 December sales 12 24 1900Q4
9 January sales 13 13 1900Q1
10 February sales 13 26 1900Q1
11 March sales 13 39 1900Q1
CodePudding user response:
You can define custom groups from a list of lists:
groups = [['January', 'February', 'March'],
['April', 'May', 'June'],
['July', 'August', 'September'],
['October', 'November', 'December'],
]
# make mapper
d = {k:v for v,l in enumerate(groups) for k in l}
df['Q-Total'] = df.groupby([df['month'].map(d), 'kpi'])['re_o'].cumsum()
output:
month kpi re_o Q-Total
0 April sales 1 1
1 May sales quantity 1 1
2 June sales 1 2
3 July sales 11 11
4 August sales 11 22
5 September sales 11 33
6 October sales 12 12
7 November sales quantity 12 12
8 December sales 12 24
9 January sales 13 13
10 February sales 13 26
11 March sales 13 39