I have the following data set:
import pandas as pd
data = [['2020-01-01', 'A', 0.05], ['2020-01-02', 'A', 0.06], ['2020-01-03', 'A', 0.12], ['2020-01-04', 'A', 0.09], ['2020-01-05', 'A', 0.07], ['2020-01-01', 'B', 0.10], ['2020-01-02', 'B', 0.20], ['2020-01-03', 'B', 0.15], ['2020-01-04', 'B', 0.12], ['2020-01-05', 'B', 0.18], ['2020-01-01', 'C', 0.05], ['2020-01-02', 'C', 0.11], ['2020-01-03', 'C', 0.18], ['2020-01-04', 'C', 0.09], ['2020-01-05', 'C', 0.22]]
df = pd.DataFrame(data, columns = ['DATE', 'Stock', 'Return'])
df
Out[1]:
DATE Stock Return
0 2020-01-01 A 0.05
1 2020-01-02 A 0.06
2 2020-01-03 A 0.12
3 2020-01-04 A 0.09
4 2020-01-05 A 0.07
5 2020-01-01 B 0.10
6 2020-01-02 B 0.20
7 2020-01-03 B 0.15
8 2020-01-04 B 0.12
9 2020-01-05 B 0.18
10 2020-01-01 C 0.05
11 2020-01-02 C 0.11
12 2020-01-03 C 0.18
13 2020-01-04 C 0.09
14 2020-01-05 C 0.22
For each stock, I want to normalize the stock price of the time-series to 100 at t=-1 and apply the following formula for t=0, 1, 2, ..., n:
Pt = Pt-1 * (1 rt), where Pt = Price in period t and rt = Return in period t, respectively.
Eventually, I would like to receive the following:
Out[3]:
DATE Stock Return Price
0 2020-01-01 A 0.05 105.00
1 2020-01-02 A 0.06 111.30
2 2020-01-03 A 0.12 124.66
3 2020-01-04 A 0.09 135.88
4 2020-01-05 A 0.07 145.39
5 2020-01-01 B 0.10 110.00
6 2020-01-02 B 0.20 132.00
7 2020-01-03 B 0.15 151.80
8 2020-01-04 B 0.12 170.02
9 2020-01-05 B 0.18 200.62
10 2020-01-01 C 0.05 105.00
11 2020-01-02 C 0.11 116.55
12 2020-01-03 C 0.18 137.53
13 2020-01-04 C 0.09 149.91
14 2020-01-05 C 0.22 182.89
For instance, at t=0 for stock A, the price would be: 100*(1 0.05) = 105. Similarly, for t=1, the price would be: 105*(1 0.06) = 111.30 etc. Seems quite straightforward, I know, but somehow I cannot figure how to properly set it with pandas. Is there a for loop required? Thanks for any suggestions!
CodePudding user response:
Seems like you'll need something iterative. Let's keep it simple with a for loop:
pt = [100]
for rt in df['Return'].tolist():
pt.append(pt[-1] * (1 rt))
df['Price'] = pt[1:]
df
DATE Stock Return Price
0 2020-01-01 A 0.05 105.000000
1 2020-01-02 A 0.06 111.300000
2 2020-01-03 A 0.12 124.656000
3 2020-01-04 A 0.09 135.875040
4 2020-01-05 A 0.07 145.386293
5 2020-01-01 B 0.10 159.924922
6 2020-01-02 B 0.20 191.909906
7 2020-01-03 B 0.15 220.696392
8 2020-01-04 B 0.12 247.179960
9 2020-01-05 B 0.18 291.672352
This is quite fast, but if you need something faster there is always the option of numba or cython.
To do this per group, we can wrap the loop into a function and use groupby.apply
:
def calculate_price(group):
pt = [100]
for rt in group['Return'].tolist():
pt.append(pt[-1] * (1 rt))
return pd.Series(pt[1:], index=group.index)
df['Price'] = df.groupby('Stock', group_keys=False).apply(calculate_price)
df
DATE Stock Return Price
0 2020-01-01 A 0.05 105.000000
1 2020-01-02 A 0.06 111.300000
2 2020-01-03 A 0.12 124.656000
3 2020-01-04 A 0.09 135.875040
4 2020-01-05 A 0.07 145.386293
5 2020-01-01 B 0.10 110.000000
6 2020-01-02 B 0.20 132.000000
7 2020-01-03 B 0.15 151.800000
8 2020-01-04 B 0.12 170.016000
9 2020-01-05 B 0.18 200.618880
10 2020-01-01 C 0.05 105.000000
11 2020-01-02 C 0.11 116.550000
12 2020-01-03 C 0.18 137.529000
13 2020-01-04 C 0.09 149.906610
14 2020-01-05 C 0.22 182.886064
CodePudding user response:
Try using pandas cumprod
and groupby
methods:
df['Price'] = (df.assign(Return = df.Return 1)
.groupby('Stock')['Return']
.cumprod()
.mul(100)
)
result:
DATE Stock Return Price
0 2020-01-01 A 0.05 105.000000
1 2020-01-02 A 0.06 111.300000
2 2020-01-03 A 0.12 124.656000
3 2020-01-04 A 0.09 135.875040
4 2020-01-05 A 0.07 145.386293
5 2020-01-01 B 0.10 110.000000
6 2020-01-02 B 0.20 132.000000
7 2020-01-03 B 0.15 151.800000
8 2020-01-04 B 0.12 170.016000
9 2020-01-05 B 0.18 200.618880
10 2020-01-01 C 0.05 105.000000
11 2020-01-02 C 0.11 116.550000
12 2020-01-03 C 0.18 137.529000
13 2020-01-04 C 0.09 149.906610
14 2020-01-05 C 0.22 182.886064