Home > database >  How to mask only the most recent date?
How to mask only the most recent date?

Time:07-01

Let's say I have 2 days worth of date times:

import pandas as pd
import numpy as np

index = pd.date_range("2020-01-01 00:00:00", "2020-01-03 00:00:00", freq="15T")

print(index)
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 00:15:00',
               '2020-01-01 00:30:00', '2020-01-01 00:45:00',
               '2020-01-01 01:00:00', '2020-01-01 01:15:00',
               '2020-01-01 01:30:00', '2020-01-01 01:45:00',
               '2020-01-01 02:00:00', '2020-01-01 02:15:00',
               ...
               '2020-01-02 21:45:00', '2020-01-02 22:00:00',
               '2020-01-02 22:15:00', '2020-01-02 22:30:00',
               '2020-01-02 22:45:00', '2020-01-02 23:00:00',
               '2020-01-02 23:15:00', '2020-01-02 23:30:00',
               '2020-01-02 23:45:00', '2020-01-03 00:00:00'],
              dtype='datetime64[ns]', length=193, freq='15T')

And then I declare a pandas array based on my index with each element set to False:

entries = pd.Series(False, index=index)

I then get the market times, and the open times:

market_hours_indices = entries.index.indexer_between_time('9:30', '16:00')
market_hours_index = entries.index[market_hours_indices]
open_hours_indices = entries.index.indexer_between_time('9:30', '10:30')
open_hours_index = entries.index[open_hours_indices]

And then mask my initial pandas array with my indexs:

entries = entries[market_hours_index]
entries[~entries.index.isin(open_hours_index)] = False
entries

This gives me:

2020-01-01 09:30:00     True
2020-01-01 09:45:00     True
2020-01-01 10:00:00     True
2020-01-01 10:15:00     True
2020-01-01 10:30:00     True
2020-01-01 10:45:00    False
2020-01-01 11:00:00    False
2020-01-01 11:15:00    False
2020-01-01 11:30:00    False
2020-01-01 11:45:00    False
2020-01-01 12:00:00    False
2020-01-01 12:15:00    False
2020-01-01 12:30:00    False
2020-01-01 12:45:00    False
2020-01-01 13:00:00    False
2020-01-01 13:15:00    False
2020-01-01 13:30:00    False
2020-01-01 13:45:00    False
2020-01-01 14:00:00    False
2020-01-01 14:15:00    False
2020-01-01 14:30:00    False
2020-01-01 14:45:00    False
2020-01-01 15:00:00    False
2020-01-01 15:15:00    False
2020-01-01 15:30:00    False
2020-01-01 15:45:00    False
2020-01-01 16:00:00    False
2020-01-02 09:30:00     True
2020-01-02 09:45:00     True
2020-01-02 10:00:00     True
2020-01-02 10:15:00     True
2020-01-02 10:30:00     True
2020-01-02 10:45:00    False
2020-01-02 11:00:00    False
2020-01-02 11:15:00    False
2020-01-02 11:30:00    False
2020-01-02 11:45:00    False
2020-01-02 12:00:00    False
2020-01-02 12:15:00    False
2020-01-02 12:30:00    False
2020-01-02 12:45:00    False
2020-01-02 13:00:00    False
2020-01-02 13:15:00    False
2020-01-02 13:30:00    False
2020-01-02 13:45:00    False
2020-01-02 14:00:00    False
2020-01-02 14:15:00    False
2020-01-02 14:30:00    False
2020-01-02 14:45:00    False
2020-01-02 15:00:00    False
2020-01-02 15:15:00    False
2020-01-02 15:30:00    False
2020-01-02 15:45:00    False
2020-01-02 16:00:00    False
dtype: bool

Which is almost what I'm aiming to do, but how do I only target the second day? So that my array looks like:

2020-01-01 09:30:00    False
2020-01-01 09:45:00    False
2020-01-01 10:00:00    False
2020-01-01 10:15:00    False
2020-01-01 10:30:00    False
2020-01-01 10:45:00    False
2020-01-01 11:00:00    False
2020-01-01 11:15:00    False
2020-01-01 11:30:00    False
2020-01-01 11:45:00    False
2020-01-01 12:00:00    False
2020-01-01 12:15:00    False
2020-01-01 12:30:00    False
2020-01-01 12:45:00    False
2020-01-01 13:00:00    False
2020-01-01 13:15:00    False
2020-01-01 13:30:00    False
2020-01-01 13:45:00    False
2020-01-01 14:00:00    False
2020-01-01 14:15:00    False
2020-01-01 14:30:00    False
2020-01-01 14:45:00    False
2020-01-01 15:00:00    False
2020-01-01 15:15:00    False
2020-01-01 15:30:00    False
2020-01-01 15:45:00    False
2020-01-01 16:00:00    False
2020-01-02 09:30:00     True
2020-01-02 09:45:00     True
2020-01-02 10:00:00     True
2020-01-02 10:15:00     True
2020-01-02 10:30:00     True
2020-01-02 10:45:00    False
2020-01-02 11:00:00    False
2020-01-02 11:15:00    False
2020-01-02 11:30:00    False
2020-01-02 11:45:00    False
2020-01-02 12:00:00    False
2020-01-02 12:15:00    False
2020-01-02 12:30:00    False
2020-01-02 12:45:00    False
2020-01-02 13:00:00    False
2020-01-02 13:15:00    False
2020-01-02 13:30:00    False
2020-01-02 13:45:00    False
2020-01-02 14:00:00    False
2020-01-02 14:15:00    False
2020-01-02 14:30:00    False
2020-01-02 14:45:00    False
2020-01-02 15:00:00    False
2020-01-02 15:15:00    False
2020-01-02 15:30:00    False
2020-01-02 15:45:00    False
2020-01-02 16:00:00    False
dtype: bool

What I've tried:

# get the days I'm interested in
df_all_days = df[(df['date'] >= previous_day) & (df['date'] <= date)]

# get the day I want to trade
df_current_day = df[df['date'] == date]
current_day_index = df_current_day.index
# create an empty numpy array 
entries = pd.Series(True, index=current_day_index)

# get the index of the current day between trading hours
market_hours_indices = entries.index.indexer_between_time('9:30', '16:00')
market_hours_index = entries.index[market_hours_indices]

# return False any dates and times which fall outside of our date
df_all_days['enter'] = df_all_days['enter'][~df_all_days['enter'].index.isin(market_hours_index)] = False

This returns (everything false):

2022-06-15 07:16:00    False
2022-06-15 09:17:00    False
2022-06-15 09:18:00    False
2022-06-15 09:19:00    False
...
2022-06-16 19:59:00    False
Name: enter, dtype: bool

CodePudding user response:

I figured it out:

df = pd.read_csv(fname, index_col='datetime', usecols=["open", "high", "low", "close", "volume", "datetime", "date", "time"], parse_dates=True)

# get the days I'm interested in
df_all_days = df[(df['date'] >= previous_day) & (df['date'] <= date)]

# get the day I want to trade
df_current_day = df_all_days[df_all_days['date'] == date]
# get the hours I want to trade
df_current_day_indices = df_current_day.index.indexer_between_time('09:30', '16:00')
current_day_index = df_current_day.index[df_current_day_indices]

# create fake entry logic
df_all_days['entries'] = np.where((0), True, False)
# only enter during the days and hours we wanna trade        
df_all_days['entries'] = df_all_days['entries'].index.isin(current_day_index)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
    # both arrays are the same size
    print(df_all_days['entries'].size)
    print(df_all_days['close'].size)
    print(df_all_days['entries'])
  • Related