Home > Net >  Create new dataframe column from a shifted existing column
Create new dataframe column from a shifted existing column

Time:10-03

I have a dataframe with open, high, low, close prices of a stock. I want to add an additional column that has the percent change between today's open and yesterday's high. This is my current implementation, however, the resulting column contains percent changes between the current day's high and open.

df

  open  high  low  close
0 100   110   95   103
1 103   113   103   111
2 111   132   109   124
3 124   136   114   130

My attempt (incorrect):

df['prevhigh_curropen'] = (df['open'] - df['high']).shift(-1) / df['high'].shift(-1)

Output (incorrect):

  open  high  low  close  prevhigh_curropen
0 100   110   95   103    -0.091
1 103   113   103  111    -0.089
2 111   132   109  124    -0.159
3 124   136   114  130    -0.088

Desired output:

  open  high  low  close  prevhigh_curropen
0 100   110   95   103    nan
1 103   113   103  111    -0.064
2 111   132   109  124    -0.018
3 124   136   114  130    -0.061

Is there a non-iterative way to do this like I attempted above?

CodePudding user response:

Your formula is wrong, you have to use df['high'].shift():

df = pd.DataFrame({'open': range(1, 11), 'high': range(1, 11)})
df['prevhigh_curropen'] = df['open'].sub(df['high'].shift()) \
                                    .div(df['high'].shift()) \
                                    .mul(100)
>>> df
   open  high  prevhigh_curropen
0     1     1                NaN
1     2     2         100.000000
2     3     3          50.000000
3     4     4          33.333333
4     5     5          25.000000
5     6     6          20.000000
6     7     7          16.666667
7     8     8          14.285714
8     9     9          12.500000
9    10    10          11.111111

For your sample the output is:

>>> df
   open  high  low  close  prevhigh_curropen
0   100   110   95    103                NaN
1   103   113  103    111          -6.363636
2   111   132  109    124          -1.769912
3   124   136  114    130          -6.060606

The first value is NaN because we don't know the high value from the previous day.

CodePudding user response:

We can simplify the terms slightly from (a - b) / b to (a / b) - (b / b) to (a / b) - 1.

Mathematical Operators:

df['prevhigh_curropen'] = (df['open'] / df['high'].shift()) - 1

or with Series Methods:

df['prevhigh_curropen'] = df['open'].div(df['high'].shift()).sub(1)

*The benefit here is that we only need to shift once, and maintain 1 copy of df['high'].shift()

Resulting df:

   open  high  low  close  prevhigh_curropen
0   100   110   95    103                NaN
1   103   113  103    111          -0.063636
2   111   132  109    124          -0.017699
3   124   136  114    130          -0.060606

Setup Used:

import pandas as pd

df = pd.DataFrame({
    'open': [100, 103, 111, 124],
    'high': [110, 113, 132, 136],
    'low': [95, 103, 109, 114],
    'close': [103, 111, 124, 130]
})
  • Related