Keep only day 1 from multi day runners - stock market, Python-CodePudding

I am trying to build a strategy around intraday runners in the stock market, and I need to eliminate from my dataset any date from a symbol which occurs after the first day of a multi day runner.

However, only when that symbol runs multiple days. If it is only a day 1, then I need to keep it.

So let's assume a stock moves X% every day, for 3 days, I only want to keep the 1st day.

Once the run ends for that symbol I need to repeat the process (e.g keep only the first day from all other runs).

Say I already have a dataframe of tickers and dates which fit my gap criteria, like so:

   symbol        date
1    FOXO  2022-12-22 // day 1 - keep
2    FOXO  2022-12-23 // day 2 - remove
3    FOXO  2022-12-27 // day 3 - remove - we had trading breaks here for Christmas and weekends, therefore it's still considered day 3 //
4    FOXO  2022-12-29 // day 1 - keep
5    FOXO  2022-12-30 // day 2 - remove
6    FOXO  2023-01-03 // day 1 - keep
7    FOXO  2023-01-04 // day 2 - remove
8    FOXO  2023-01-05 // day 3 - remove
6    APPL  2023-01-03 // day 1 - keep
7    APPL  2023-01-04 // day 2 - remove
8    APPL  2023-01-05 // day 3 - remove

How can I achieve the desired result with pandas?

CodePudding user response：

In Short

You can use numpy.busday_count(start, end, holidays=[]) to do this. This will ignore the weekends and holidays.

import numpy as np
np.busday_count(
    '2022-12-23',
    '2022-12-27',
    holidays=['2022-12-24', '2022-12-25', '2022-12-26']
)

In case you want to include the weekends, you should be able to specify that with weekmask (see documentation).

Full Example

# Prepare Packages
import pandas as pd
import numpy as np

# Prepare Dataset
df = pd.DataFrame({
    'symbol': ['FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'APPL', 'APPL', 'APPL'],
    'date': ['2022-12-22', '2022-12-23', '2022-12-27', '2022-12-29', '2022-12-30', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-02', '2023-01-03', '2023-01-04']
})

# List Holidays
holidays = ['2022-12-24', '2022-12-25', '2022-12-26']

# Get the consecutive date
df['prev_date'] = df.groupby('symbol')['date'].shift(1)

# Get trading days difference (ignores weekends and holidays)
df['trading_days_diff'] = np.busday_count(
    df['prev_date'].fillna(df['date']).to_list(),
    df['date'].to_list(),
    holidays=holidays
)

# Get rows to keep
df[ df['trading_days_diff']!=1 ]

To see the days you could add the code below

df['run_id'] = (df['trading_days_diff']!=1).cumsum()
df['days'] = df.groupby('run_id').cumcount() 1

To get only runs with multiple days you could do the following:

df[
    (df['run_id'].isin(df[ df['days']>1 ]['run_id']))
    & (df['days'] == 1)
]

CodePudding user response：

I would define some function that takes in two dates (or series of dates handled element wise) that determines if they're consecutive. I am not sure how to consistently handle trading breaks so I'm using a "separated by at most 1 day" placeholder here:

def is_consecutive(start, end):
    return end - start <= pd.Timedelta("1D")

def get_first_day_of_runs(x):
    groups = (~is_consecutive(x.shift(), x)).cumsum()
    return x.groupby(groups).min()

df.groupby("symbol")["date"].apply(get_first_day_of_runs)

Output:

symbol
APPL    1   2023-01-02
FOXO    1   2022-12-22
        2   2022-12-27  # <<< You'll have to handle this one
        3   2022-12-29
        4   2023-01-02