How to apply a simple mathematical formula based on the previous row in Pandas?-CodePudding

I have the following data set:

import pandas as pd
data = [['2020-01-01', 'A', 0.05], ['2020-01-02', 'A', 0.06], ['2020-01-03', 'A', 0.12], ['2020-01-04', 'A', 0.09], ['2020-01-05', 'A', 0.07],   ['2020-01-01', 'B', 0.10], ['2020-01-02', 'B', 0.20], ['2020-01-03', 'B', 0.15], ['2020-01-04', 'B', 0.12], ['2020-01-05', 'B', 0.18],    ['2020-01-01', 'C', 0.05], ['2020-01-02', 'C', 0.11], ['2020-01-03', 'C', 0.18], ['2020-01-04', 'C', 0.09], ['2020-01-05', 'C', 0.22]]
df = pd.DataFrame(data, columns = ['DATE', 'Stock', 'Return'])
df

Out[1]:
          DATE Stock  Return
0   2020-01-01     A    0.05
1   2020-01-02     A    0.06
2   2020-01-03     A    0.12
3   2020-01-04     A    0.09
4   2020-01-05     A    0.07
5   2020-01-01     B    0.10
6   2020-01-02     B    0.20
7   2020-01-03     B    0.15
8   2020-01-04     B    0.12
9   2020-01-05     B    0.18
10  2020-01-01     C    0.05
11  2020-01-02     C    0.11
12  2020-01-03     C    0.18
13  2020-01-04     C    0.09
14  2020-01-05     C    0.22

For each stock, I want to normalize the stock price of the time-series to 100 at t=-1 and apply the following formula for t=0, 1, 2, ..., n:

Pt = Pt-1 * (1 rt), where Pt = Price in period t and rt = Return in period t, respectively.

Eventually, I would like to receive the following:

Out[3]:
          DATE Stock  Return   Price
0   2020-01-01     A    0.05  105.00
1   2020-01-02     A    0.06  111.30
2   2020-01-03     A    0.12  124.66
3   2020-01-04     A    0.09  135.88
4   2020-01-05     A    0.07  145.39
5   2020-01-01     B    0.10  110.00
6   2020-01-02     B    0.20  132.00
7   2020-01-03     B    0.15  151.80
8   2020-01-04     B    0.12  170.02
9   2020-01-05     B    0.18  200.62
10  2020-01-01     C    0.05  105.00
11  2020-01-02     C    0.11  116.55
12  2020-01-03     C    0.18  137.53
13  2020-01-04     C    0.09  149.91
14  2020-01-05     C    0.22  182.89

For instance, at t=0 for stock A, the price would be: 100*(1 0.05) = 105. Similarly, for t=1, the price would be: 105*(1 0.06) = 111.30 etc. Seems quite straightforward, I know, but somehow I cannot figure how to properly set it with pandas. Is there a for loop required? Thanks for any suggestions!

CodePudding user response：

Seems like you'll need something iterative. Let's keep it simple with a for loop:

pt = [100]

for rt in df['Return'].tolist(): 
    pt.append(pt[-1] * (1   rt))

df['Price'] = pt[1:]

df
          DATE Stock  Return       Price
0   2020-01-01     A    0.05  105.000000
1   2020-01-02     A    0.06  111.300000
2   2020-01-03     A    0.12  124.656000
3   2020-01-04     A    0.09  135.875040
4   2020-01-05     A    0.07  145.386293
5   2020-01-01     B    0.10  159.924922
6   2020-01-02     B    0.20  191.909906
7   2020-01-03     B    0.15  220.696392
8   2020-01-04     B    0.12  247.179960
9   2020-01-05     B    0.18  291.672352

This is quite fast, but if you need something faster there is always the option of numba or cython.

To do this per group, we can wrap the loop into a function and use groupby.apply:

def calculate_price(group):
    pt = [100]
    
    for rt in group['Return'].tolist(): 
        pt.append(pt[-1] * (1   rt))

    return pd.Series(pt[1:], index=group.index)

df['Price'] = df.groupby('Stock', group_keys=False).apply(calculate_price)

df
          DATE Stock  Return       Price
0   2020-01-01     A    0.05  105.000000
1   2020-01-02     A    0.06  111.300000
2   2020-01-03     A    0.12  124.656000
3   2020-01-04     A    0.09  135.875040
4   2020-01-05     A    0.07  145.386293
5   2020-01-01     B    0.10  110.000000
6   2020-01-02     B    0.20  132.000000
7   2020-01-03     B    0.15  151.800000
8   2020-01-04     B    0.12  170.016000
9   2020-01-05     B    0.18  200.618880
10  2020-01-01     C    0.05  105.000000
11  2020-01-02     C    0.11  116.550000
12  2020-01-03     C    0.18  137.529000
13  2020-01-04     C    0.09  149.906610
14  2020-01-05     C    0.22  182.886064

CodePudding user response：

Try using pandas cumprod and groupby methods:

df['Price'] = (df.assign(Return = df.Return 1)
               .groupby('Stock')['Return']
               .cumprod()
               .mul(100)
               )

result:

          DATE Stock  Return       Price
0   2020-01-01     A    0.05  105.000000
1   2020-01-02     A    0.06  111.300000
2   2020-01-03     A    0.12  124.656000
3   2020-01-04     A    0.09  135.875040
4   2020-01-05     A    0.07  145.386293
5   2020-01-01     B    0.10  110.000000
6   2020-01-02     B    0.20  132.000000
7   2020-01-03     B    0.15  151.800000
8   2020-01-04     B    0.12  170.016000
9   2020-01-05     B    0.18  200.618880
10  2020-01-01     C    0.05  105.000000
11  2020-01-02     C    0.11  116.550000
12  2020-01-03     C    0.18  137.529000
13  2020-01-04     C    0.09  149.906610
14  2020-01-05     C    0.22  182.886064