I have an OHLC dataframe, e.g.:
index | open | close | high | low |
---|---|---|---|---|
2021-03-23 10:00:00 00:00 | 1421.100 | 1424.500 | 1427.720 | 1422.650 |
2021-03-23 11:00:00 00:00 | 1424.500 | 1421.480 | 1422.400 | 1411.890 |
2021-03-23 12:00:00 00:00 | 1421.480 | 1435.170 | 1443.980 | 1433.780 |
2021-03-23 13:00:00 00:00 | 1435.170 | 1440.860 | 1443.190 | 1437.590 |
2021-03-23 14:00:00 00:00 | 1440.860 | 1438.920 | 1443.570 | 1435.200 |
2021-03-23 15:00:00 00:00 | 1438.920 | 1435.990 | 1444.840 | 1435.060 |
2021-03-23 16:00:00 00:00 | 1435.990 | 1441.920 | 1446.610 | 1441.450 |
Now I want to find out, if the price will first increase or decrease e.g. for 1%. What I have so far is the following working code:
def check(x):
check = ohlc[ohlc.index > x.name]
price = ohlc.at[x.name, 'close']
high_thr = price * 1.01
low_thr = price * 0.99
high_indexes = check[check['high'] > high_thr]
low_indexes = check[check['low'] < low_thr]
if high_indexes.shape[0] > 0 and low_indexes.shape[0] > 0:
high = high_indexes.index[0]
low = low_indexes.index[0]
if high < low:
return 1
elif high > low:
return -1
else:
return 0
else:
return 0
ohlc['check'] = ohlc.apply(find_threshold, axis=1)
This is extremely slow for larger datasets. Is there any other better way than iterating over every row, slicing and finding all indexes to get the nearest one?
CodePudding user response:
I think the best way to do this is not too different from how you're doing it:
from datetime import timedelta
def check(x, change=0.01):
time = x.name
price = ohlc.loc[time, 'close']
while True:
if time not in ohlc.index: # If we reach the end
return 0
high = ohlc.loc[time, 'high']
low = ohlc.loc[time, 'low']
if high > (1.0 change) * price: # Upper thresh broken
return 1
elif low < 1.0 - change) * price: # Lower thresh broken
return -1
time = time timedelta(hours=1) # Time update
ohlc['check'] = ohlc.apply(check, axis=1)
If efficiency is what you're worreid about, applying this way is slightly more efficient because it only looks ahead as far as it needs to to break the threshold. Optionally, you could limit this to up to, say, 100 hours into the future by modifying the while loop, capping the number of checks per row to 100:
endtime = time timedelta(hours=100)
while time < endtime:
# etc