Home > Net >  How to filter dataframe based on condition that index is between date intervals?
How to filter dataframe based on condition that index is between date intervals?

Time:12-18

I have 2 dataframes: df_dec_light and df_rally.

df_dec_light.head():
    log_return  month   year
1970-12-01  0.003092    12  1970
1970-12-02  0.011481    12  1970
1970-12-03  0.004736    12  1970
1970-12-04  0.006279    12  1970
1970-12-07  0.005351    12  1970
1970-12-08  -0.005239   12  1970
1970-12-09  0.000782    12  1970
1970-12-10  0.004235    12  1970
1970-12-11  0.003774    12  1970
1970-12-14  -0.005109   12  1970
df_rally.head():

rally_start rally_end
0   1970-12-18  1970-12-31
1   1971-12-17  1971-12-31
2   1972-12-15  1972-12-29
3   1973-12-21  1973-12-31
4   1974-12-20  1974-12-31

I need to filter df_dec_light based on condition that df_dec_light.index is between values of columns df_rally['rally_start']and df_rally['rally_end'].

I've tried something like this: df_dec_light[(df_dec_light.index >= df_rally['rally_start']) & (df_dec_light.index <= df_rally['rally_end'])]

I was expecting to to recieve filtered df_dec_light dataframe with indexes that are within intervals between df_rally['rally_start'] and df_rally['rally_end']. Something like this:


    log_return  month   year
1970-12-18  0.001997    12  1970
1970-12-21  -0.003108   12  1970
1970-12-22  0.001111    12  1970
1970-12-23  0.000666    12  1970
1970-12-24  0.005644    12  1970
1970-12-28  0.005283    12  1970
1970-12-29  0.010810    12  1970
1970-12-30  0.002061    12  1970
1970-12-31  -0.001301   12  1970

Would really apreciate any help. Thanks!

CodePudding user response:

To solve this we can first turn the ranges in df_rally into pd.DateTimeIndex by calling pd.date_range on each row. This will give us each row of df_rally as a pd.DateTimeIndex.
As we want to later check if the index of df_dec_light is in any of the ranges, we want to combine all of these ranges. This is done with union.
We assert that the newly created pd.Series index_list is not empty and then select its first element. This element is the pd.DateTimeIndex on which we can now call union with all other pd.DateTimeIndex.
We can now use pd.Index.isin to create a boolean array of whether each index Date is found in the passed set of Dates.

If we now apply this mask to df_dec_light it returns only the entries that are within one of the specified ranges of df_rally.

index_list = df_rally.apply(lambda x: pd.date_range(x['rally_start'], x['rally_end']), axis=1)
assert(not index_list.empty)
all_ranges=index_list.iloc[0]
for range in index_list:
    all_ranges=all_ranges.union(range)
print(all_ranges)
mask = df_dec_light.index.isin(all_ranges)
print(df_dec_light[mask])

CodePudding user response:

Let's create an IntervalIndex from the start and end column values in df_rally dataframe, then map the intervals on index of df_dec_light dataframe and use notna to check if the index values are contained in any interval

ix = pd.IntervalIndex.from_arrays(df_rally.rally_start, df_rally.rally_end, closed='both')
mask = df_dec_light.index.map(ix.to_series()).notna()

then use the mask to filter the dataframe

df_dec_light[mask]
  • Related