I am trying to build a strategy around intraday runners in the stock market, and I need to eliminate from my dataset any date from a symbol which occurs after the first day of a multi day runner.
However, only when that symbol runs multiple days. If it is only a day 1, then I need to keep it.
So let's assume a stock moves X% every day, for 3 days, I only want to keep the 1st day.
Once the run ends for that symbol I need to repeat the process (e.g keep only the first day from all other runs).
Say I already have a dataframe of tickers and dates which fit my gap criteria, like so:
symbol date
1 FOXO 2022-12-22 // day 1 - keep
2 FOXO 2022-12-23 // day 2 - remove
3 FOXO 2022-12-27 // day 3 - remove - we had trading breaks here for Christmas and weekends, therefore it's still considered day 3 //
4 FOXO 2022-12-29 // day 1 - keep
5 FOXO 2022-12-30 // day 2 - remove
6 FOXO 2023-01-03 // day 1 - keep
7 FOXO 2023-01-04 // day 2 - remove
8 FOXO 2023-01-05 // day 3 - remove
6 APPL 2023-01-03 // day 1 - keep
7 APPL 2023-01-04 // day 2 - remove
8 APPL 2023-01-05 // day 3 - remove
How can I achieve the desired result with pandas?
CodePudding user response:
In Short
You can use numpy.busday_count(start, end, holidays=[])
to do this. This will ignore the weekends and holidays.
import numpy as np
np.busday_count(
'2022-12-23',
'2022-12-27',
holidays=['2022-12-24', '2022-12-25', '2022-12-26']
)
In case you want to include the weekends, you should be able to specify that with weekmask
(see documentation).
Full Example
# Prepare Packages
import pandas as pd
import numpy as np
# Prepare Dataset
df = pd.DataFrame({
'symbol': ['FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'APPL', 'APPL', 'APPL'],
'date': ['2022-12-22', '2022-12-23', '2022-12-27', '2022-12-29', '2022-12-30', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-02', '2023-01-03', '2023-01-04']
})
# List Holidays
holidays = ['2022-12-24', '2022-12-25', '2022-12-26']
# Get the consecutive date
df['prev_date'] = df.groupby('symbol')['date'].shift(1)
# Get trading days difference (ignores weekends and holidays)
df['trading_days_diff'] = np.busday_count(
df['prev_date'].fillna(df['date']).to_list(),
df['date'].to_list(),
holidays=holidays
)
# Get rows to keep
df[ df['trading_days_diff']!=1 ]
To see the days
you could add the code below
df['run_id'] = (df['trading_days_diff']!=1).cumsum()
df['days'] = df.groupby('run_id').cumcount() 1
To get only runs with multiple days you could do the following:
df[
(df['run_id'].isin(df[ df['days']>1 ]['run_id']))
& (df['days'] == 1)
]
CodePudding user response:
I would define some function that takes in two dates (or series of dates handled element wise) that determines if they're consecutive. I am not sure how to consistently handle trading breaks so I'm using a "separated by at most 1 day" placeholder here:
def is_consecutive(start, end):
return end - start <= pd.Timedelta("1D")
def get_first_day_of_runs(x):
groups = (~is_consecutive(x.shift(), x)).cumsum()
return x.groupby(groups).min()
df.groupby("symbol")["date"].apply(get_first_day_of_runs)
Output:
symbol
APPL 1 2023-01-02
FOXO 1 2022-12-22
2 2022-12-27 # <<< You'll have to handle this one
3 2022-12-29
4 2023-01-02