I want to calculate QTD values from different columns based on months in a pandas dataframe.
Code:
data = {'month': ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'],
'kpi': ['sales', 'sales quantity', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales quantity', 'sales', 'sales', 'sales', 'sales'],
're': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
're3 9': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20],
're6 6': [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60],
're9 3': [2, 4, 6, 8, 10, 12, 14, 16, 20, 10, 10, 20],
're_o' : [1, 1, 1, 11, 11, 11, 12, 12, 12, 13, 13, 13]
}
# Create DataFrame
df = pd.DataFrame(data)
g = pd.to_datetime(df['month'], format='%B').dt.to_period('Q')
if (df['month'].isin(['April', 'May', 'June'])):
df['Q-Total'] = df.groupby([g,'kpi'])['re'].cumsum()
elif (df['month'].isin(['July', 'August', 'September'])):
df['Q-Total'] = df.groupby([g, 'kpi'])['re3 9'].cumsum()
elif (df['month'].isin(['October', 'November', 'December'])):
df['Q-Total'] = df.groupby([g, 'kpi'])['re6 6'].cumsum()
elif (df['month'].isin(['January', 'February', 'March'])):
df['Q-Total'] = df.groupby([g, 'kpi'])['re9 3'].cumsum()
else:
print("zero")
My required output is given below:
month kpi re re3 9 re6 6 re9 3 re_o Q-Total
0 April sales 1 10 5 2 1 1
1 May sales quantity 2 20 10 4 1 2
2 June sales 3 30 15 6 1 4
3 July sales 4 40 20 8 11 40
4 August sales 5 50 25 10 11 90
5 September sales 6 60 30 12 11 150
6 October sales 7 70 35 14 12 35
7 November sales quantity 8 80 40 16 12 40
8 December sales 9 90 45 20 12 80
9 January sales 10 10 50 10 13 10
10 February sales 11 10 55 10 13 20
11 March sales 12 20 60 20 13 40
Here there is four columns named re,re3 9,re6 6,re9 3 for taking the cumulative sum values.I want to calculate cumulative sum based on the below conditions:
- If the months are April,May and June, cumulative sum will be taken only from column re
- If the months are July,August and September , cumulative sum will be taken only from re3 9
- If the months are October,November and December , cumulative sum will be taken only from re6 6
- If the months are January,February and March,Cumulative sum will be taken only from re9 3
But I got an error like below , when I run the code:
Traceback (most recent call last):
File "/home/a/p/s.py", line 54, in <module>
if (df['month'].isin(['April', 'May', 'June'])):
File "/home/a/anaconda3/envs/p/lib/python3.9/site-packages/pandas/core/generic.py", line 1527, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Can anyone suggest a solution to solve this issue?
CodePudding user response:
Use numpy.select
for new column and then use GroupBy.cumsum
:
g = pd.to_datetime(df['month'], format='%B').dt.to_period('Q')
m1 = df['month'].isin(['April', 'May', 'June'])
m2 = df['month'].isin(['July', 'August', 'September'])
m3 = df['month'].isin(['October', 'November', 'December'])
m4 = df['month'].isin(['January', 'February', 'March'])
df['Q-Total'] = np.select([m1, m2, m3, m4],
[df['re'], df['re3 9'], df['re6 6'], df['re9 3']], default=0)
df['Q-Total'] = df.groupby([g,'kpi'])['Q-Total'].cumsum()
print (df)
month kpi re re3 9 re6 6 re9 3 re_o Q-Total
0 April sales 1 10 5 2 1 1
1 May sales quantity 2 20 10 4 1 2
2 June sales 3 30 15 6 1 4
3 July sales 4 40 20 8 11 40
4 August sales 5 50 25 10 11 90
5 September sales 6 60 30 12 11 150
6 October sales 7 70 35 14 12 35
7 November sales quantity 8 80 40 16 12 40
8 December sales 9 90 45 20 12 80
9 January sales 10 10 50 10 13 10
10 February sales 11 10 55 10 13 20
11 March sales 12 20 60 20 13 40
CodePudding user response:
You can use a dictionary to map the quarter to your columns, then indexing lookup and groupby.cumsum
:
quarters = {1: 're9 3', 2: 're', 3: 're3 9', 4: 're6 6'}
col = pd.to_datetime(df['month'], format='%B').dt.quarter.map(quarters)
idx, cols = pd.factorize(col)
df['Q-total'] = (
pd.Series(df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx],
index=df.index)
.groupby([col, df['kpi']]).cumsum()
)
output:
month kpi re re3 9 re6 6 re9 3 re_o Q-total
0 April sales 1 10 5 2 1 1
1 May sales quantity 2 20 10 4 1 2
2 June sales 3 30 15 6 1 4
3 July sales 4 40 20 8 11 40
4 August sales 5 50 25 10 11 90
5 September sales 6 60 30 12 11 150
6 October sales 7 70 35 14 12 35
7 November sales quantity 8 80 40 16 12 40
8 December sales 9 90 45 20 12 80
9 January sales 10 10 50 10 13 10
10 February sales 11 10 55 10 13 20
11 March sales 12 20 60 20 13 40