Normalizing returns to 100-CodePudding

I have the following data set:

import pandas as pd
data = [['2020-01-01', 'A', 0.05], ['2020-01-02', 'A', 0.06], ['2020-01-03', 'A', 0.12], ['2020-01-04', 'A', 0.09], ['2020-01-05', 'A', 0.07],   ['2020-01-01', 'B', 0.10], ['2020-01-02', 'B', 0.20], ['2020-01-03', 'B', 0.15], ['2020-01-04', 'B', 0.12], ['2020-01-05', 'B', 0.18],    ['2020-01-01', 'C', 0.05], ['2020-01-02', 'C', 0.11], ['2020-01-03', 'C', 0.18], ['2020-01-04', 'C', 0.09], ['2020-01-05', 'C', 0.22]]
df = pd.DataFrame(data, columns = ['DATE', 'Stock', 'Return'])
df

Out[1]:
          DATE Stock  Return
0   2020-01-01     A    0.05
1   2020-01-02     A    0.06
2   2020-01-03     A    0.12
3   2020-01-04     A    0.09
4   2020-01-05     A    0.07
5   2020-01-01     B    0.10
6   2020-01-02     B    0.20
7   2020-01-03     B    0.15
8   2020-01-04     B    0.12
9   2020-01-05     B    0.18
10  2020-01-01     C    0.05
11  2020-01-02     C    0.11
12  2020-01-03     C    0.18
13  2020-01-04     C    0.09
14  2020-01-05     C    0.22

My objective is to normalize the stock return to 100 at the beginning of the time-series, and then adjust it according to the corresponding stock performance in subsequent days. I aim to receive the following (reflected in the column "Price"):

data2 = [['2020-01-01', 'A', 0.05, 100], ['2020-01-02', 'A', 0.06, 120], ['2020-01-03', 'A', 0.12, 240], ['2020-01-04', 'A', 0.09, 180], ['2020-01-05', 'A', 0.07, 140],   ['2020-01-01', 'B', 0.10, 100], ['2020-01-02', 'B', 0.20, 200], ['2020-01-03', 'B', 0.15, 150], ['2020-01-04', 'B', 0.12, 120], ['2020-01-05', 'B', 0.18, 180],    ['2020-01-01', 'C', 0.05, 100], ['2020-01-02', 'C', 0.11, 220], ['2020-01-03', 'C', 0.18, 360], ['2020-01-04', 'C', 0.09, 180], ['2020-01-05', 'C', 0.22, 440]]
df2 = pd.DataFrame(data2, columns = ['DATE', 'Stock', 'Return', 'Price'])
df2

Out[2]:
          DATE Stock  Return  Price
0   2020-01-01     A    0.05    100
1   2020-01-02     A    0.06    120
2   2020-01-03     A    0.12    240
3   2020-01-04     A    0.09    180
4   2020-01-05     A    0.07    140
5   2020-01-01     B    0.10    100
6   2020-01-02     B    0.20    200
7   2020-01-03     B    0.15    150
8   2020-01-04     B    0.12    120
9   2020-01-05     B    0.18    180
10  2020-01-01     C    0.05    100
11  2020-01-02     C    0.11    220
12  2020-01-03     C    0.18    360
13  2020-01-04     C    0.09    180
14  2020-01-05     C    0.22    440

I am aware of a way to reshape the data format from long to wide using the command df = df.reset_index().pivot_table(values='Return', index='DATE', columns='Stock') and then normalize the returns using df = df.pct_change().fillna(0).add(1).cumprod().mul(100).reset_index(), which would yield the following output:

Out[3]:
Stock        DATE      A      B      C
0      2020-01-01  100.0  100.0  100.0
1      2020-01-02  120.0  200.0  220.0
2      2020-01-03  240.0  150.0  360.0
3      2020-01-04  180.0  120.0  180.0
4      2020-01-05  140.0  180.0  440.0

In this case, however, I want all stocks to be listed in one column, as initially suggested. Is there a way to add the column "Price" and computing the values accordingly for each stock, i.e. for each unique value in the column "Stock"? Is a "for"-loop required for this task? Thank you for any suggestions and advices!!

CodePudding user response：

You can use groupby transform with first to grab the first value, then divide each row and *100

df['Price'] = df['Return'].div(df['Return'].groupby(df['Stock']).transform('first'))*100

print(df)

          DATE Stock  Return  Price
0   2020-01-01     A    0.05  100.0
1   2020-01-02     A    0.06  120.0
2   2020-01-03     A    0.12  240.0
3   2020-01-04     A    0.09  180.0
4   2020-01-05     A    0.07  140.0
5   2020-01-01     B    0.10  100.0
6   2020-01-02     B    0.20  200.0
7   2020-01-03     B    0.15  150.0
8   2020-01-04     B    0.12  120.0
9   2020-01-05     B    0.18  180.0
10  2020-01-01     C    0.05  100.0
11  2020-01-02     C    0.11  220.0
12  2020-01-03     C    0.18  360.0
13  2020-01-04     C    0.09  180.0
14  2020-01-05     C    0.22  440.0

CodePudding user response：

In your case do shift with cumprod

def func(x): 
    return 100 * ((x/x.shift()).fillna(1)).cumprod()
df.groupby('Stock')['Return'].transform(func)
Out[138]: 
0     100.0
1     120.0
2     240.0
3     180.0
4     140.0
5     100.0
6     200.0
7     150.0
8     120.0
9     180.0
10    100.0
11    220.0
12    360.0
13    180.0
14    440.0
Name: Return, dtype: float64