For each row of a dataframe I want to calculate the x day high. See desired output and test code below:
Test data
def genMockDataFrame(days,startPrice,colName,startDate,seed=None):
periods = days*24
np.random.seed(seed)
steps = np.random.normal(loc=0, scale=0.0018, size=periods)
steps[0]=0
P = startPrice np.cumsum(steps)
P = [round(i,4) for i in P]
fxDF = pd.DataFrame({
'ticker':np.repeat( [colName], periods ),
'date':np.tile( pd.date_range(startDate, periods=periods, freq='H'), 1 ),
'price':(P)})
fxDF.index = pd.to_datetime(fxDF.date)
fxDF = fxDF.price.resample('D').ohlc()
fxDF.columns = [i.title() for i in fxDF.columns]
return fxDF
df = genMockDataFrame(15,1.1904,'eurusd','19/3/2020',seed=157)
How would I create this new x_day_high
column with this desired output?
High x_day_high
date
2020-03-19 1.1937 0
2020-03-20 1.1879 0
2020-03-21 1.1767 0
2020-03-22 1.1721 0
2020-03-23 1.1819 2
2020-03-24 1.1928 4
2020-03-25 1.1939 6
2020-03-26 1.2141 7
2020-03-27 1.2144 8
2020-03-28 1.2099 0
2020-03-29 1.2033 0
2020-03-30 1.1887 0
2020-03-31 1.1972 1
2020-04-01 1.1997 2
2020-04-02 1.1924 0
CodePudding user response:
I think your x_day_high
is incorrect for days 3-23 through 3-27, or at least it is calculated differently than the 3-31 and 4-01 days. For example, you put a 1 for 3-31, and it is greater than 1 day before it. 4-01 is greater than 2 days before it, which matches your desired output.
For 03-23, though, you have a 3, but it is only greater than 2 days before it, not 3. Same applies for the others in that block, they look off.
Below is the code that I was able to put together that gets the desired output with that change/caveat. I feel like this ought to be more straightforward, so there might be a better solution, but this might work.
for n in reversed(range(len(df['High']))):
for i in reversed(range(n)):
if df['High'][n] > df['High'][i]:
df['x_day_high'][n] = n - i
else: break
This loops through the dataframe from bottom to top. We take the bottom row n
and then do a second reverse loop through everything above n
. If the High of n
is greater than the high of i
, then we record the df['x_day_high']
as n-i
. We then go to the next i
and check this again, and we continually rewrite that value until we get to a row where df['High'][n]
is not greater than df['High'][i]
, and then we break out of the i
loop and go to the next n.
This gave me the below output.
date Open High Low Close x_day_high
2020-03-19 1.1904 1.1937 1.1832 1.1832 0
2020-03-20 1.1843 1.1879 1.1769 1.1772 0
2020-03-21 1.1755 1.1767 1.1662 1.1672 0
2020-03-22 1.1686 1.1721 1.1611 1.1721 0
2020-03-23 1.1732 1.1819 1.169 1.1819 2
2020-03-24 1.1836 1.1928 1.1807 1.1922 4
2020-03-25 1.1939 1.1939 1.1864 1.1936 6
2020-03-26 1.1967 1.2141 1.1964 1.2114 7
2020-03-27 1.2118 1.2144 1.2039 1.2089 8
2020-03-28 1.208 1.2099 1.2018 1.2041 0
2020-03-29 1.2033 1.2033 1.1853 1.188 0
2020-03-30 1.1876 1.1887 1.1806 1.1879 0
2020-03-31 1.1921 1.1972 1.1873 1.1939 1
2020-04-01 1.1932 1.1997 1.1914 1.1914 2
2020-04-02 1.1902 1.1924 1.1781 1.1862 0