I have a DataFrame containing intraday, 5 min data, for 1 month, 23 days trading days. Around 1725 rows, so 75 rows per day. There is a signal column, that gives buy/sell signal based on ST for now, but I might use another indicator later on.
PHASE 2 (in the Code added below) is where the code has to be corrected/implemented. I started working on it, but been stuck due to my lack of pandas knowledge.
So I wanna create a new column, that only captures "High" column values, if the "High" value at any point of that same day is greater than the "High" value corresponding to the "buy" signal row.
And also, create a new column, that only captures "Low" column values, if the "Low" value at any point of that same day is Lower than the "Low" value corresponding to "buy" signal row. Condition 2 is only checked if condition 1 is satisfied.
These 2 columns may help in analyzing the data, or for calculation. Column creation is not important but getting the necessary values for performing the calculations is necessary.
Need a column that calculates returns.
3a. If Condition 1 didn't happen, return 0.
3b. If Condition 1 happened, but condition 2 didn't happen, return (last "Close" of day - "High" of "buy" signal row)
3c. If Condition 1 happened, AND condition 2 happened, return ("Low" of "buy" row - "High" of "buy" row)
Adding some example scenarios below -
Here the Higher High and Lower Low both happened in same row.
return = Condition 3c, i.e. ["Low" of "buy" row - "High" of "buy" row]
Another example below -
Here the Higher High happened first, then after a few rows, Lower Low happened.
return = Condition 3c, i.e. ["Low" of "buy" row - "High" of "buy" row]
In cases where Lower Low doesn't happen, we take the last "Close" of that day.
Return = Condition 3b, i.e. ["Close" at 3 pm of that day - "High" of "buy" signal row]
All of this above is for "buy" signal condition. I'll have to do it for "sell" as well. But if I can learn how to do it for one, I'll be able to do the other.
Code & Sample Data -
# -*- coding: utf-8 -*-
"""
Created on Mon Jun 15
"""
import datetime as dt
import yfinance as yf
import pandas as pd
import numpy as np
import pandas_ta as ta
import copy
tickers = ["LT.NS","BRITANNIA.NS","CIPLA.NS","HINDALCO.NS"]
ohlc_intraday = {}
today = dt.datetime.today()
end = today - dt.timedelta(1)
start = today - dt.timedelta(20)
# looping over tickers and creating a dataframe with ohlcv, intraday tf
for ticker in tickers:
temp = yf.download(ticker, period="1mo",interval = "5m")
temp.dropna(how="any",inplace=True)
# temp = temp[temp["Volume"] > 0]
ohlc_intraday[ticker] = temp
# ohlc_intraday[ticker].to_csv("supertrend_data.csv")
tickers = ohlc_intraday.keys()
# dont run above code again & again
# run below code
# Phase 1 - Data Collection & Signal Generation
ohlc_dict = copy.deepcopy(ohlc_intraday)
tickers_signal = {}
tickers_ret = {}
for ticker in tickers:
ohlc_dict[ticker]["datetime"] = pd.to_datetime(ohlc_dict[ticker].index)
ohlc_dict[ticker]["hour"] = ohlc_dict[ticker]["datetime"].dt.hour
ohlc_dict[ticker]["date"] = ohlc_dict[ticker]["datetime"].dt.date
unique_days = len(ohlc_dict[ticker]["date"].unique())
num_5min_candles = 75 # number of candles in intraday
ohlc_dict[ticker]["candle"] = np.array(range(len(ohlc_dict[ticker])))%(num_5min_candles) 1
# Supertrend indicator's value column
ohlc_dict[ticker]["ST"] = ta.supertrend(high=ohlc_dict[ticker]["High"],low=ohlc_dict[ticker]["Low"],
close=ohlc_dict[ticker]["Close"],length=7,
multiplier=3)["SUPERT_7_3.0"]
def st_levels(close,stlevel):
if close > stlevel:
return "buy"
elif close < stlevel:
return "sell"
# capture Supertrend buy/sell signals
ohlc_dict[ticker]["signal"] = np.vectorize(st_levels)(ohlc_dict[ticker]["Close"],ohlc_dict[ticker]["ST"])
# keep only first occurrence of signal, remove consecutive duplicated signals
ohlc_dict[ticker].loc[ohlc_dict[ticker]["signal"].eq(ohlc_dict[ticker]["signal"].shift()), "signal"] = "no"
# supertrend length; first 7 rows will contain "no" since ST period length is 7
ohlc_dict[ticker]["signal"][:7] = "no"
# capturing returns
tickers_signal[ticker] = ""
tickers_ret[ticker] = []
# ohlc_dict[ticker].to_csv("supertrend_data.csv")
# Phase 2 - Calculating returns for intraday -
for ticker in tickers:
print("Calculating returns for ", ticker)
# ticker = "CIPLA.NS"
for i in range(len(ohlc_dict[ticker])):
if tickers_signal[ticker] == "":
tickers_ret[ticker].append(0)
# check if signal candle High/Low is broken for Buy/Sell Scenario respectively
if ohlc_dict[ticker]["High"][i] >= ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["High"].values[0]:
tickers_signal[ticker] = "BUY"
elif ohlc_dict[ticker]["Low"][i] <= ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="sell"]["Low"].values[0]:
tickers_signal[ticker] = "SELL"
elif tickers_signal[ticker] == "BUY":
# calculating returns for StopLoss
if ohlc_dict[ticker]["Low"][i] < ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["Low"].values[0]: # and add that both candles be on same day
tickers_signal[ticker] = ""
tickers_ret[ticker].append(((ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["Low"].values[0] - ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["High"].values[0])/(ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["High"].values[0])) - 1)
# calculating returns for 70th candle close in intraday
else:
tickers_ret[ticker].append(((ohlc_dict[ticker][ohlc_dict[ticker]['candle']==70]["Close"].values[0] - ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["High"].values[0])/(ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["High"].values[0])) - 1)
elif tickers_signal[ticker] == "SELL":
# calculating returns for StopLoss
if ohlc_dict[ticker]["High"][i] > ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["High"].values[0]:
tickers_signal[ticker] = ""
tickers_ret[ticker].append(((ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["Low"].values[0] - ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["High"].values[0])/(ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["Low"].values[0])) - 1)
# calculating returns for 70th candle close in intraday
else:
tickers_ret[ticker].append(((ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["Low"].values[0] - ohlc_dict[ticker][ohlc_dict[ticker]['candle']==70]["Close"].values[0])/(ohlc_dict[ticker][ohlc_dict[ticker]['signal']=="buy"]["High"].values[0])) - 1)
ohlc_dict[ticker]["ret"] = np.array(tickers_ret[ticker])
Below data is for HINDALCO.NS stock -
Datetime | Open | High | Low | Close | Adj Close | Volume | datetime | hour | date | candle | ST | signal | remarks | Return Calculation |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022-05-18 13:35:00 05:30 | 431.5499878 | 433.5 | 431.0499878 | 433.1000061 | 433.1000061 | 263811 | 2022-05-18 13:35:00 05:30 | 13 | 18-05-2022 | 53 | 427.2457627 | buy | ||
2022-05-18 13:40:00 05:30 | 433.2000122 | 436.7999878 | 433.1000061 | 435.5499878 | 435.5499878 | 667616 | 2022-05-18 13:40:00 05:30 | 13 | 18-05-2022 | 54 | 429.0532754 | no | Higher High made | |
2022-05-18 13:45:00 05:30 | 435.6000061 | 436.5 | 435.3500061 | 435.7000122 | 435.7000122 | 188908 | 2022-05-18 13:45:00 05:30 | 13 | 18-05-2022 | 55 | 430.3779006 | no | ||
2022-05-18 13:50:00 05:30 | 435.7000122 | 435.7999878 | 434.2999878 | 434.3500061 | 434.3500061 | 296853 | 2022-05-18 13:50:00 05:30 | 13 | 18-05-2022 | 56 | 430.3779006 | no | ||
2022-05-18 13:55:00 05:30 | 434.6000061 | 436.1499939 | 434.4500122 | 435.4500122 | 435.4500122 | 169486 | 2022-05-18 13:55:00 05:30 | 13 | 18-05-2022 | 57 | 430.3779006 | no | ||
2022-05-18 14:00:00 05:30 | 435.4500122 | 436.1000061 | 434.75 | 436.0499878 | 436.0499878 | 343782 | 2022-05-18 14:00:00 05:30 | 14 | 18-05-2022 | 58 | 430.3779006 | no | ||
2022-05-18 14:05:00 05:30 | 436.0499878 | 437.1000061 | 433.6499939 | 433.7999878 | 433.7999878 | 511022 | 2022-05-18 14:05:00 05:30 | 14 | 18-05-2022 | 59 | 430.3779006 | no | ||
2022-05-18 14:10:00 05:30 | 433.75 | 434 | 432.6499939 | 433.3999939 | 433.3999939 | 249525 | 2022-05-18 14:10:00 05:30 | 14 | 18-05-2022 | 60 | 430.3779006 | no | ||
2022-05-18 14:15:00 05:30 | 433.6000061 | 434.5 | 432.7999878 | 433.75 | 433.75 | 217272 | 2022-05-18 14:15:00 05:30 | 14 | 18-05-2022 | 61 | 430.3779006 | no | ||
2022-05-18 14:20:00 05:30 | 433.7999878 | 435.8500061 | 433.6000061 | 434.6499939 | 434.6499939 | 228628 | 2022-05-18 14:20:00 05:30 | 14 | 18-05-2022 | 62 | 430.3779006 | no | ||
2022-05-18 14:25:00 05:30 | 434.6499939 | 435.2000122 | 434.2000122 | 434.5499878 | 434.5499878 | 149256 | 2022-05-18 14:25:00 05:30 | 14 | 18-05-2022 | 63 | 430.3779006 | no | ||
2022-05-18 14:30:00 05:30 | 434.4500122 | 434.75 | 433.75 | 433.8999939 | 433.8999939 | 155492 | 2022-05-18 14:30:00 05:30 | 14 | 18-05-2022 | 64 | 430.3779006 | no | ||
2022-05-18 14:35:00 05:30 | 433.8999939 | 434.9500122 | 433.6499939 | 434.5 | 434.5 | 128515 | 2022-05-18 14:35:00 05:30 | 14 | 18-05-2022 | 65 | 430.3779006 | no | ||
2022-05-18 14:40:00 05:30 | 434.6000061 | 435 | 434.25 | 434.5 | 434.5 | 117541 | 2022-05-18 14:40:00 05:30 | 14 | 18-05-2022 | 66 | 430.3779006 | no | ||
2022-05-18 14:45:00 05:30 | 434.4500122 | 435.1000061 | 433.1000061 | 433.2999878 | 433.2999878 | 276280 | 2022-05-18 14:45:00 05:30 | 14 | 18-05-2022 | 67 | 430.3779006 | no | ||
2022-05-18 14:50:00 05:30 | 433.25 | 434.8999939 | 433.2000122 | 434.7000122 | 434.7000122 | 163797 | 2022-05-18 14:50:00 05:30 | 14 | 18-05-2022 | 68 | 430.3779006 | no | ||
2022-05-18 14:55:00 05:30 | 434.7000122 | 435.6499939 | 434.5499878 | 435.2000122 | 435.2000122 | 201351 | 2022-05-18 14:55:00 05:30 | 14 | 18-05-2022 | 69 | 430.5468637 | no | ||
2022-05-18 15:00:00 05:30 | 435.0499878 | 435.75 | 433.75 | 435.5 | 435.5 | 384571 | 2022-05-18 15:00:00 05:30 | 15 | 18-05-2022 | 70 | 430.5468637 | no | Lower Low not made, Exited at Market Close | Return = Close (15:00 pm) - High of 'buy' signal row (highlighted in yellow above) |
UPDATE - In case of "buy" signal happens, and Trade is live, waiting to exit in loss or profit, and if opposite signal "sell" appears, exit trade at that "sell" signal "Close" value.
If this is too complicated, just ignore the signal change happening on same day.
PHASE 2 is where the code has to be corrected/implemented. Checking if signal was generated, if Higher Highs / Lower Lows happened, or just exiting at Market Close (3 pm) and calculate returns accordingly.
Currently as per my code, I'm trying to take "Close" of 70th candle, which is around 15:05 pm. But please feel free to modify/improve it so the condition takes 15:00 pm Close value, either using candle count, or using datetime value.
I know this is a huge favor I'm asking, but I have been trying to get this done for over 2 weeks and couldn't find any help. Lot of tutorials and solutions are on basic, standardized methods. This is a bit unconventional, and so unable to solve it on my own.
CodePudding user response:
Here is one solution. The methodology is as follows:
- For each ticker in tickers (as your code includes)
- Group the dataframe by "date"
- Apply the function to this group
The calculations returned are what you specified in the conditions:
3a.
0
3b.
Last's/Sell's close - Buy's high
3c.
Buy's low - Buy's high
This is different to the calculations computed in your #Phase 2
code.
The final output of the function adds the "rets" value to the "sell" or "close" row (I'm not sure exactly what you wanted to do with this, but this is consistent with your sample data remark).
def phase_2(x):
# returns are 0 unless event occurs
x["ret"] = 0
# for each buy/sell or buy/close combination
for i in range(sum(x["signal"] == "buy")):
after_buy = x.loc[x[x["signal"] == "buy"].index[i]:]
# if sell exists after buy, then range up to sell row
if sum(after_buy["signal"] == "sell") > 0:
after_buy = after_buy.loc[:after_buy.query("signal == 'sell'").index.min()]
# elif hour 15 exists after buy, then range up to the first of these (15:00)
elif len(after_buy.query("hour == 15")) > 0:
after_buy = after_buy.loc[:after_buy.query("hour == 15").index.min()]
# else to end of day... (this would only occur if buy after 15:00 in day)
# check if there is a "buy" signal in that day
if len(after_buy) > 0:
# high (from "buy signal row")
high = after_buy["High"].iloc[0]
# low (from "buy signal row")
low = after_buy["Low"].iloc[0]
# close (last row)
close = after_buy["Close"].iloc[-1]
# higher high
after_higher_high = after_buy[after_buy["High"].cummax() > high]
# if rows exist, then condition 1 met
if len(after_higher_high) > 0:
# # HH value
# higher_high = after_higher_high["High"][0]
# lower low
lower_low = after_higher_high[after_higher_high["Low"].cummin() < low]
# if rows exist, then condition 2 met, return 3c.
if len(lower_low) > 0:
# # LL value
# lower_low = lower_low["Low"][0]
# return 3c.
# print(after_buy["datetime"][0], after_buy["datetime"][-1], low - high) # line to help understand which period is covered
x["ret"].loc[after_buy.index[-1]] = low - high # return to "sell" or "close" row of period
# if not, then condition 2 not met, return 3b.
else:
# return 3b.
# print(after_buy["datetime"][0], after_buy["datetime"][-1], close - high) # line to help understand which period is covered
x["ret"].loc[after_buy.index[-1]] = close - high # return to "sell" or "close" row of period
# if not, then condition 1 not met, return 3a.
else:
# return 3a.
# print(after_buy["datetime"][0], after_buy["datetime"][-1], 0) # line to help understand which period is covered
x["ret"].loc[after_buy.index[-1]] = 0 # return to "sell" or "close" row of period
return x
# for each ticker
for ticker in tickers:
# overwrite dataframe as new dataframe
ohlc_dict[ticker] = ohlc_dict[ticker].groupby(["date"]).apply(lambda x: phase_2(x))
There are a few items within the function that I have left (in case you needed them for anything):
- The HH value in the "higher high" section, and the LL value in the "lower low" section, which from what I understood of you question are not actually used.
- The print statements just before the "ret" values are added to the dataframe, which you can use to understand the periods used in the function, and the values returned.
The sample data's return from this calculation is 2 (435.5 (15:00 Close) - 433.5 (Buy High)
).
Hopefully the comments within the function are sufficient as an explanation, but if not let me know!
NOTE:
The code returns the whole dataframe, and then overwrites the dataframe. As the only "changed" column within the function is "rets"
, the last line of the function could change to return x["rets"], and the for loop for each ticker to
ohlc_dict[ticker]["rets"] = ...`