Home > Blockchain >  How can I add a stacked growth column to my time series data in pandas?
How can I add a stacked growth column to my time series data in pandas?

Time:08-04

I have a table like below, using this dummy data:

data = [['Jane', 10,10.5,11,13,45,41,66,21,88,99,77,84,66,8,77,22,11,44,69,85,36,4,
         87,74,56,88,23,6,9,8,55,12,4,58,36,44,89,81,7,98,52,11,45,87,96,32,58,76],
        ['John',11,22,55,23,6,9,8,41,12,4,58,66,99,36,44,89,81,7,98,52,33,11,45,87,
         96,32,58,55,12,4,58,13,36,52,89,81,7,98,34,44,11,25,22,44,55,88,56,47],
        ['Tom',23,32,43,12,10,10.5,11,46,45,45,46,66,21,88,99,77,8,77,22,11,44,69,8511,44,77,85,99,45,63,
        89,81,7,98,3,52,89,81,7,98,13,58,36,44,89,81,7,98,52]]

df = pd.DataFrame(data, columns = ['Name','January-19','February-19','March-19','April-19','May-19','June-19',
                                   'July-19','August-19', 'September-19', 'October-19','November-19', 'December-19',
                                   'January-20','February-20','March-20','April-20','May-20','June-20',
                                   'July-20','August-20', 'September-20', 'October-20','November-20','December-20',
                                   'January-21','February-21','March-21','April-21','May-21','June-21',
                                   'July-21','August-21', 'September-21', 'October-21','November-21',
                                   'December-21','Q1 2019',
                                   'Q2 2019', 'Q3 2019', 'Q4 2019','Q1 2020',
                                   'Q2 2020', 'Q3 2020', 'Q4 2020','Q1 2021',
                                   'Q2 2021', 'Q3 2021', 'Q4 2021'])

This results in the follow dataframe: enter image description here

Each entry indicates a growth rate.

I need to calculate 2Y and 3Y stacked growth rates using a specific formula for each month and quarter in 2021.

Formula example:

3Y Jan 21 = ((1 Jan 2021 rate) * (1 Jan 2020 rate) * (1 Jan 2019 rate)-1

2Y Jan 21 = ((1 Jan 2021 rate) * (1 Jan 2020 rate))-1

I had initially done the following, which was to transpose the data and try to use the percentage change method in Pandas, indicating a time period of 24 months for the 2 year stacked column:

test1 = df.T
new_header = test1.iloc[0] #grab the first row for the header
test1 = test1[1:] #take the data less the header row
test1.columns = new_header #
test1 = test1.reset_index()
test1.rename(columns={'index': 'Date'}, inplace = True)
test1['Jane_2Y_Growth'] = test1['Jane'].pct_change(24)
test1.tail(15)

enter image description here

However I don't think its correct because I get a different answer when I manually do the formula.

Does anyone know how to apply this formula to all of my periods in 2021? 12 months and 4 quarters.

So, the 2Y stacked growth column for Q4 2021 for example, would use the Q4 figures from 2021 and 2020.

Thanks

CodePudding user response:

Based on the formula example you have given :

3Y Jan 21 = ((1 Jan 2021 rate) * (1 Jan 2020 rate) * (1 Jan 2019 rate)-1
2Y Jan 21 = ((1 Jan 2021 rate) * (1 Jan 2020 rate))-1

Transposing and filtering values of interest from dataframe

NDF = df.T
NDF, NDF.columns = NDF[1:] , NDF.iloc[0]
result_df = NDF.loc[[x for x in NDF.index if '21' in x ]]

Creating placeholder columns for the resultant dataframe

for col in NDF.columns:
    result_df[col   '_2Y Growth'] = 0
    result_df[col   '_3Y Growth'] = 0

The above gives us

result_df
Name         Jane John Tom  ...  John_3Y Growth  Tom_2Y Growth  Tom_3Y Growth
January-21     56   96  77  ...               0              0              0
February-21    88   32  85  ...               0              0              0
March-21       23   58  99  ...               0              0              0
April-21        6   55  45  ...               0              0              0
May-21          9   12  63  ...               0              0              0
June-21         8    4  89  ...               0              0              0
July-21        55   58  81  ...               0              0              0
August-21      12   13   7  ...               0              0              0
September-21    4   36  98  ...               0              0              0
October-21     58   52   3  ...               0              0              0
November-21    36   89  52  ...               0              0              0
December-21    44   81  89  ...               0              0              0
Q1 2021        96   55  81  ...               0              0              0
Q2 2021        32   88   7  ...               0              0              0
Q3 2021        58   56  98  ...               0              0              0
Q4 2021        76   47  52  ...               0              0              0

Applying the formula :

for idx in NDF.index:
    if('21' in idx):
        idx20 = idx[:-2]   '20'
        idx19 = idx[:-2]   '19'
        for col in NDF.columns:
            result_df.loc[idx, col   '_3Y Growth'] = (1   NDF.loc[idx, col]) * (1   NDF.loc[idx20, col]) * (1   NDF.loc[idx19, col]) - 1
            result_df.loc[idx, col   '_2Y Growth'] = (1   NDF.loc[idx, col]) * (1   NDF.loc[idx20, col]) - 1

Output :

This gives us the expected output based on the formula

Jane John Tom Jane_2Y Growth Jane_3Y Growth John_2Y Growth John_3Y Growth Tom_2Y Growth Tom_3Y Growth
January-21 56 96 77 3818 42008 9699 116399 1715 41183
February-21 88 32 85 800 9210.5 1220 28082 7653 252581
March-21 23 58 99 1871 22463 2654 148679 9999 439999
April-21 6 55 45 160 2253 5039 120959 3587 46643
May-21 9 12 63 119 5519 1065 7461 575 6335
June-21 8 4 89 404 17009 39 399 7019 80729
July-21 55 58 81 3919 262639 5840 52568 1885 22631
August-21 12 13 7 1117 24595 741 31163 95 4511
September-21 4 36 98 184 16464 1257 16353 4454 204929
October-21 58 52 3 294 29499 635 3179 279 12879
November-21 36 89 52 3255 253967 4139 244259 451135 21203391
December-21 44 81 89 3374 286874 7215 483471 4049 271349
Q1 2021 96 55 81 5140 462689 671 5375 4837 396715
Q2 2021 32 88 7 395 32471 2313 229085 295 2367
Q3 2021 58 56 98 2713 21711 1310 45884 4454 441044
Q4 2021 76 47 52 6775 670823 2159 97199 4769 66779

EDIT : As OP asked,

Is there a way to add the new columns to the bottom instead and instead of labelling them 'Jane_2Y Growth' for example, label then 'January-21 2Y growth'. This way they are added to the index and do not have any names in them, just the date and the growth period.

Instead of adding the result to results dataframe we can add it to NDF datafrmae in the appropriate cell. The result would be stored in NDF

for idx in NDF.index:
    if('21' in idx):
        idx20 = idx[:-2]   '20'
        idx19 = idx[:-2]   '19'
        for col in NDF.columns:
            NDF.loc[idx   ' 3Y Growth', col] = (1   NDF.loc[idx, col]) * (1   NDF.loc[idx20, col]) * (1   NDF.loc[idx19, col]) - 1
            NDF.loc[idx   ' 2Y Growth', col] = (1   NDF.loc[idx, col]) * (1   NDF.loc[idx20, col]) - 1

This gives us the expected output :

Jane John Tom
January-19 10 11 23
February-19 10.5 22 32
March-19 11 55 43
April-19 13 23 12
May-19 45 6 10
June-19 41 9 10.5
July-19 66 8 11
August-19 21 41 46
September-19 88 12 45
October-19 99 4 45
November-19 77 58 46
December-19 84 66 66
January-20 66 99 21
February-20 8 36 88
March-20 77 44 99
April-20 22 89 77
May-20 11 81 8
June-20 44 7 77
July-20 69 98 22
August-20 85 52 11
September-20 36 33 44
October-20 4 11 69
November-20 87 45 8511
December-20 74 87 44
January-21 56 96 77
February-21 88 32 85
March-21 23 58 99
April-21 6 55 45
May-21 9 12 63
June-21 8 4 89
July-21 55 58 81
August-21 12 13 7
September-21 4 36 98
October-21 58 52 3
November-21 36 89 52
December-21 44 81 89
Q1 2019 89 7 81
Q2 2019 81 98 7
Q3 2019 7 34 98
Q4 2019 98 44 13
Q1 2020 52 11 58
Q2 2020 11 25 36
Q3 2020 45 22 44
Q4 2020 87 44 89
Q1 2021 96 55 81
Q2 2021 32 88 7
Q3 2021 58 56 98
Q4 2021 76 47 52
January-21 3Y Growth 42008 116399 41183
January-21 2Y Growth 3818 9699 1715
February-21 3Y Growth 9210.5 28082 252581
February-21 2Y Growth 800 1220 7653
March-21 3Y Growth 22463 148679 439999
March-21 2Y Growth 1871 2654 9999
April-21 3Y Growth 2253 120959 46643
April-21 2Y Growth 160 5039 3587
May-21 3Y Growth 5519 7461 6335
May-21 2Y Growth 119 1065 575
June-21 3Y Growth 17009 399 80729
June-21 2Y Growth 404 39 7019
July-21 3Y Growth 262639 52568 22631
July-21 2Y Growth 3919 5840 1885
August-21 3Y Growth 24595 31163 4511
August-21 2Y Growth 1117 741 95
September-21 3Y Growth 16464 16353 204929
September-21 2Y Growth 184 1257 4454
October-21 3Y Growth 29499 3179 12879
October-21 2Y Growth 294 635 279
November-21 3Y Growth 253967 244259 2.12034e 07
November-21 2Y Growth 3255 4139 451135
December-21 3Y Growth 286874 483471 271349
December-21 2Y Growth 3374 7215 4049
Q1 2021 3Y Growth 462689 5375 396715
Q1 2021 2Y Growth 5140 671 4837
Q2 2021 3Y Growth 32471 229085 2367
Q2 2021 2Y Growth 395 2313 295
Q3 2021 3Y Growth 21711 45884 441044
Q3 2021 2Y Growth 2713 1310 4454
Q4 2021 3Y Growth 670823 97199 66779
Q4 2021 2Y Growth 6775 2159 4769
  • Related