Home > Software design >  Pandas calculate x day high value
Pandas calculate x day high value

Time:11-28

For each row of a dataframe I want to calculate the x day high. See desired output and test code below:

Test data

def genMockDataFrame(days,startPrice,colName,startDate,seed=None): 
   
    periods = days*24
    np.random.seed(seed)
    steps = np.random.normal(loc=0, scale=0.0018, size=periods)
    steps[0]=0
    P = startPrice np.cumsum(steps)
    P = [round(i,4) for i in P]

    fxDF = pd.DataFrame({ 
        'ticker':np.repeat( [colName], periods ),
        'date':np.tile( pd.date_range(startDate, periods=periods, freq='H'), 1 ),
        'price':(P)})
    fxDF.index = pd.to_datetime(fxDF.date)
    fxDF = fxDF.price.resample('D').ohlc()
    fxDF.columns = [i.title() for i in fxDF.columns]
    return fxDF


df = genMockDataFrame(15,1.1904,'eurusd','19/3/2020',seed=157)

How would I create this new x_day_high column with this desired output?

            High    x_day_high
date        
2020-03-19  1.1937  0
2020-03-20  1.1879  0
2020-03-21  1.1767  0
2020-03-22  1.1721  0
2020-03-23  1.1819  2
2020-03-24  1.1928  4
2020-03-25  1.1939  6
2020-03-26  1.2141  7
2020-03-27  1.2144  8
2020-03-28  1.2099  0
2020-03-29  1.2033  0
2020-03-30  1.1887  0
2020-03-31  1.1972  1
2020-04-01  1.1997  2
2020-04-02  1.1924  0

CodePudding user response:

I think your x_day_high is incorrect for days 3-23 through 3-27, or at least it is calculated differently than the 3-31 and 4-01 days. For example, you put a 1 for 3-31, and it is greater than 1 day before it. 4-01 is greater than 2 days before it, which matches your desired output.

For 03-23, though, you have a 3, but it is only greater than 2 days before it, not 3. Same applies for the others in that block, they look off.

Below is the code that I was able to put together that gets the desired output with that change/caveat. I feel like this ought to be more straightforward, so there might be a better solution, but this might work.

for n in reversed(range(len(df['High']))):
    for i in reversed(range(n)):
            if df['High'][n] > df['High'][i]:
                df['x_day_high'][n] = n - i
            else: break

This loops through the dataframe from bottom to top. We take the bottom row n and then do a second reverse loop through everything above n. If the High of n is greater than the high of i, then we record the df['x_day_high'] as n-i. We then go to the next i and check this again, and we continually rewrite that value until we get to a row where df['High'][n] is not greater than df['High'][i], and then we break out of the i loop and go to the next n.

This gave me the below output.

date        Open    High    Low     Close   x_day_high
2020-03-19  1.1904  1.1937  1.1832  1.1832  0
2020-03-20  1.1843  1.1879  1.1769  1.1772  0
2020-03-21  1.1755  1.1767  1.1662  1.1672  0
2020-03-22  1.1686  1.1721  1.1611  1.1721  0
2020-03-23  1.1732  1.1819  1.169   1.1819  2
2020-03-24  1.1836  1.1928  1.1807  1.1922  4
2020-03-25  1.1939  1.1939  1.1864  1.1936  6
2020-03-26  1.1967  1.2141  1.1964  1.2114  7
2020-03-27  1.2118  1.2144  1.2039  1.2089  8
2020-03-28  1.208   1.2099  1.2018  1.2041  0
2020-03-29  1.2033  1.2033  1.1853  1.188   0
2020-03-30  1.1876  1.1887  1.1806  1.1879  0
2020-03-31  1.1921  1.1972  1.1873  1.1939  1
2020-04-01  1.1932  1.1997  1.1914  1.1914  2
2020-04-02  1.1902  1.1924  1.1781  1.1862  0
  • Related