I have almost a years worth of minute data for a particular stock ticker, which I have masked to return a Pandas series of booleans whether an EMA has crossed another, e.g:
print(entries)
print(type(entries))
print(f'len of entries {len(entries)}')
Output:
datetime
2021-05-03 08:00:00 False
2021-05-03 08:03:00 False
2021-05-03 08:04:00 False
2021-05-03 08:06:00 False
2021-05-03 08:08:00 False
2021-05-03 08:09:00 False
2021-05-03 08:11:00 False
2021-05-03 08:29:00 False
2021-05-03 08:34:00 False
2021-05-03 08:41:00 False
2021-05-03 08:52:00 False
2021-05-03 08:55:00 False
2021-05-03 09:07:00 False
2021-05-03 09:13:00 False
2021-05-03 09:14:00 False
2021-05-03 09:26:00 False
2021-05-03 09:27:00 False
2021-05-03 09:34:00 False
2021-05-03 09:44:00 False
2021-05-03 09:57:00 False
2021-05-03 10:14:00 False
2021-05-03 10:31:00 False
2021-05-03 10:38:00 False
2021-05-03 10:41:00 False
...
2022-03-02 23:49:00 False
dtype: bool
<class 'pandas.core.series.Series'>
len of entries 141615
I'm then getting the current day and I'm interested in, and the hours between 930-1600 only:
display(df_market_hours)
print(type(df_market_hours))
print(f'len of df_market_hours {len(df_market_hours)}')
Output:
volume open close high low time date
datetime
2021-08-13 09:30:00 200.0 15.0000 15.0000 15.00 15.0000 09:30:00 2021-08-13
2021-08-13 10:01:00 310.0 15.0000 15.0000 15.00 15.0000 10:01:00 2021-08-13
2021-08-13 10:26:00 1031.0 14.9000 14.8900 14.90 14.8900 10:26:00 2021-08-13
2021-08-13 10:29:00 223.0 14.8900 14.8900 14.89 14.8900 10:29:00 2021-08-13
2021-08-13 11:00:00 285.0 14.8900 14.8900 14.89 14.8900 11:00:00 2021-08-13
... ... ... ... ... ... ... ...
2021-08-13 15:55:00 11489.0 18.0200 18.0100 18.04 17.9700 15:55:00 2021-08-13
2021-08-13 15:56:00 28828.0 18.0400 17.9700 18.18 17.9300 15:56:00 2021-08-13
2021-08-13 15:57:00 5720.0 17.9600 18.0412 18.05 17.9600 15:57:00 2021-08-13
2021-08-13 15:58:00 33329.0 18.0401 18.1800 18.31 18.0294 15:58:00 2021-08-13
2021-08-13 15:59:00 51872.0 18.1400 18.1650 18.18 18.0600 15:59:00 2021-08-13
184 rows × 7 columns
<class 'pandas.core.frame.DataFrame'>
len of df_market_hours 184
What I'm trying to do is return a new series, where the entries
have the same index and shape as df_market_hours
. How do I achieve this?
What I've tried so far:
getting the index of the current session time:
df_market_hours_idx = df_current_day.between_time('9:30', '16:00', include_end=False).index
df_market_hours = df_current_day.loc[df_market_hours_idx]
And then trying to create a mask with it:
df_copy = entries[entries == df_market_hours.index]
When I do that, I get the following error:
ValueError: ('Lengths must match to compare', (141615,), (54991,))
I think I need to use reindex
and fill_value
like shown here but I'm struggling to figure out how this applies to my use case
CodePudding user response:
As I understand, you want to filter entries
dataframe by df_market_hours
index
considered_indice = df_market_hours.index.values.tolist()
mask = entries.index.isin(considered_indice)
df_copy = entries[mask]
CodePudding user response:
I am not too sure to understand the question but if you want to keep the initial indexing of the original pd.DataFrame
, while extracting a pd.Series
? Maybe you can play with the indexing option from the pd.Series()
function.
EDIT:
What you can do is create a list of your entries, then filter using the isin() function to extract the relevant data in your df_market_hours
:
lst_entries = list(entries)
masked_df_market_hours = df_market_hours[df_market_hours["datetime"].isin(lst_entries)]