Home > Back-end >  How to calculate QTD values from different columns based on months in a pandas dataframe?
How to calculate QTD values from different columns based on months in a pandas dataframe?

Time:10-13

I want to calculate QTD values from different columns based on months in a pandas dataframe.

Code:

data = {'month': ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'],
    'kpi': ['sales', 'sales quantity', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales quantity', 'sales', 'sales', 'sales', 'sales'],
    're': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
    're3 9': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20],
    're6 6': [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60],
    're9 3': [2, 4, 6, 8, 10, 12, 14, 16, 20, 10, 10, 20],
    're_o' : [1, 1, 1, 11, 11, 11, 12, 12, 12, 13, 13, 13]
    }

# Create DataFrame
df = pd.DataFrame(data)
g = pd.to_datetime(df['month'], format='%B').dt.to_period('Q')

if (df['month'].isin(['April', 'May', 'June'])):
        df['Q-Total'] = df.groupby([g,'kpi'])['re'].cumsum()
elif (df['month'].isin(['July', 'August', 'September'])):
        df['Q-Total'] = df.groupby([g, 'kpi'])['re3 9'].cumsum()
elif (df['month'].isin(['October', 'November', 'December'])):
        df['Q-Total'] = df.groupby([g, 'kpi'])['re6 6'].cumsum()
elif (df['month'].isin(['January', 'February', 'March'])):
        df['Q-Total'] = df.groupby([g, 'kpi'])['re9 3'].cumsum()
else:
        print("zero")

My required output is given below:

       month             kpi   re re3 9 re6 6 re9 3  re_o  Q-Total
0       April           sales   1   10    5     2      1        1
1         May  sales quantity   2   20   10     4      1        2
2        June           sales   3   30   15     6      1        4
3        July           sales   4   40   20     8     11       40
4      August           sales   5   50   25    10     11       90
5   September           sales   6   60   30    12     11      150
6     October           sales   7   70   35    14     12       35
7    November  sales quantity   8   80   40    16     12       40
8    December           sales   9   90   45    20     12       80
9     January           sales  10   10   50    10     13       10
10   February           sales  11   10   55    10     13       20
11      March           sales  12   20   60    20     13       40

Here there is four columns named re,re3 9,re6 6,re9 3 for taking the cumulative sum values.I want to calculate cumulative sum based on the below conditions:

  1. If the months are April,May and June, cumulative sum will be taken only from column re
  2. If the months are July,August and September , cumulative sum will be taken only from re3 9
  3. If the months are October,November and December , cumulative sum will be taken only from re6 6
  4. If the months are January,February and March,Cumulative sum will be taken only from re9 3

But I got an error like below , when I run the code:

Traceback (most recent call last):
File "/home/a/p/s.py", line 54, in <module>
if (df['month'].isin(['April', 'May', 'June'])):
File "/home/a/anaconda3/envs/p/lib/python3.9/site-packages/pandas/core/generic.py", line 1527, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can anyone suggest a solution to solve this issue?

CodePudding user response:

Use numpy.select for new column and then use GroupBy.cumsum:

g = pd.to_datetime(df['month'], format='%B').dt.to_period('Q')

m1 = df['month'].isin(['April', 'May', 'June'])
m2 = df['month'].isin(['July', 'August', 'September'])
m3 = df['month'].isin(['October', 'November', 'December'])
m4 = df['month'].isin(['January', 'February', 'March'])


df['Q-Total'] = np.select([m1, m2, m3, m4], 
                          [df['re'], df['re3 9'], df['re6 6'], df['re9 3']], default=0)

df['Q-Total'] = df.groupby([g,'kpi'])['Q-Total'].cumsum()
print (df)
        month             kpi  re  re3 9  re6 6  re9 3  re_o  Q-Total
0       April           sales   1     10      5      2     1        1
1         May  sales quantity   2     20     10      4     1        2
2        June           sales   3     30     15      6     1        4
3        July           sales   4     40     20      8    11       40
4      August           sales   5     50     25     10    11       90
5   September           sales   6     60     30     12    11      150
6     October           sales   7     70     35     14    12       35
7    November  sales quantity   8     80     40     16    12       40
8    December           sales   9     90     45     20    12       80
9     January           sales  10     10     50     10    13       10
10   February           sales  11     10     55     10    13       20
11      March           sales  12     20     60     20    13       40

CodePudding user response:

You can use a dictionary to map the quarter to your columns, then indexing lookup and groupby.cumsum:

quarters = {1: 're9 3', 2: 're', 3: 're3 9', 4: 're6 6'}

col = pd.to_datetime(df['month'], format='%B').dt.quarter.map(quarters)

idx, cols = pd.factorize(col)

df['Q-total'] = (
     pd.Series(df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx],
               index=df.index)
       .groupby([col, df['kpi']]).cumsum()
    )

output:

        month             kpi  re  re3 9  re6 6  re9 3  re_o  Q-total
0       April           sales   1     10      5      2     1        1
1         May  sales quantity   2     20     10      4     1        2
2        June           sales   3     30     15      6     1        4
3        July           sales   4     40     20      8    11       40
4      August           sales   5     50     25     10    11       90
5   September           sales   6     60     30     12    11      150
6     October           sales   7     70     35     14    12       35
7    November  sales quantity   8     80     40     16    12       40
8    December           sales   9     90     45     20    12       80
9     January           sales  10     10     50     10    13       10
10   February           sales  11     10     55     10    13       20
11      March           sales  12     20     60     20    13       40
  • Related