Home > database >  How to correctly estimate the percentage change of two columns considering different indexes in the
How to correctly estimate the percentage change of two columns considering different indexes in the

Time:07-21

Say I have the following df called df_trading_pair:

    Start Date           Open Price     High Price  Low Price   Close Price End Date
0   2022-07-20 08:00:00  0.19277        0.19324     0.19225     0.19324     2022-07-20 08:04:59.999
1   2022-07-20 08:05:00  0.19321        0.194       0.1932      0.19388     2022-07-20 08:09:59.999
2   2022-07-20 08:10:00  0.19387        0.195       0.19387     0.19489     2022-07-20 08:14:59.999
3   2022-07-20 08:15:00  0.19496        0.19628     0.19495     0.19626     2022-07-20 08:19:59.999
4   2022-07-20 08:20:00  0.19625        0.20406     0.19625     0.2035      2022-07-20 08:24:59.999

I have been trying to figure out a simple way to get the percentage change of the first 4 elements in Open Price column vs the last 4 elements in Close Price, so I end up getting the following output in a new_df:

 Close Price vs Open Price % change
0                            0.0057
1                            0.0087
2                            0.0123
3                            0.0438
dtype: float64

At first, I thought the following sentence should've worked perfectly, after all both arrays would have 4 elements and would contain exactly the values that I need:

new_df["Close Price vs Open Price % change"] = (df_trading_pair["Close Price"][1:]-df_trading_pair["Open Price"][:-1])/df_trading_pair["Open Price"][:-1]

However, that sentence ended up throwing this output:

 Close Price vs Open Price % change
0                               NaN
1                          0.003468
2                          0.005261
3                          0.006668
4                               NaN
dtype: float64

Which I don't get why, I also tried this other sentence:

new_df["Close Price vs Open Price % Change"] = [(y-x)/x*100 for x in df_trading_pair["Open Price"][:-1] for y in df_trading_pair["Close Price"][1:]]

Which in my opinion should also have done what I was looking for, but unfortunately it did not and threw the following error:

ValueError: Length of values (16) does not match length of index (5)

So, I would like to get some assistance here, what else could I do in order to get the desired output?

CodePudding user response:

You need to use shift, else pandas will realign your indices:

df['Close Price'].shift(-1).sub(df['Open Price']).div(df['Open Price'])[:-1]

Output:

0    0.005758
1    0.008695
2    0.012328
3    0.043804
dtype: float64

Your approach would work with numpy arrays as there is no index realignment:

c = df['Close Price'][1:].to_numpy()
o = df['Open Price'][:-1].to_numpy()

out = (c-o)/o

Output:

array([0.00575816, 0.0086952 , 0.01232785, 0.04380386])
  • Related