Home > Software design >  determine if datetime index is within a list of date ranges
determine if datetime index is within a list of date ranges

Time:11-19

i have the following code data...

import pandas as pd, numpy as np
from datetime import datetime
end_dt = datetime.today()
st_dt = (end_dt   pd.DateOffset(-10)).date()
df_index = pd.date_range(st_dt, end_dt)
df = pd.DataFrame(index=df_index, columns=['in_range'])

data = [pd.to_datetime(['2022-11-08','2022-11-10']), pd.to_datetime(['2022-11-13','2022-11-15'])]
dt_ranges = pd.DataFrame(data,columns={'st_dt':'datetimens[64]', 'end_dt': 'datetimens[64]'})

this produces the following two dataframes:
df:

            in_range
2022-11-08  NaN
2022-11-09  NaN
2022-11-10  NaN
2022-11-11  NaN
2022-11-12  NaN
2022-11-13  NaN
2022-11-14  NaN
2022-11-15  NaN
2022-11-16  NaN
2022-11-17  NaN
2022-11-18  NaN

and date_ranges:

    st_dt       end_dt
0   2022-11-08  2022-11-10
1   2022-11-13  2022-11-15

I would like to update the 'in_range' column to indicate if the index falls within any of the pairs of start and end dates of the 2nd dataframe. so i should end up with this:

            in_range
2022-11-08  True
2022-11-09  True
2022-11-10  True
2022-11-11  NaN
2022-11-12  NaN
2022-11-13  True
2022-11-14  True
2022-11-15  True
2022-11-16  NaN
2022-11-17  NaN
2022-11-18  NaN

I've gone down the path of trying to do this with using lambda and iteration. but to me that seems in efficient.

    def in_range(index_date, date_ranges):
        for r in date_ranges.values:
            if (r[0] >= index_date) & (r[1] <= index_date):
                return True
        return False

     df['in_range'] = df.reset_index().apply(lambda x: in_range(x.date, dt_ranges), axis=1)

the above sets in_range to NaNs always, despite the code returning the correct values. i suspect it's because i am resetting the index and so it can not align. Also, as mentioned - this solution probably is pretty inefficient

is there a more pythonic/pandemic way of doing this?

CodePudding user response:

Use merge_asof and boolean indexing:

s = df.index.to_series()
m = (pd.merge_asof(s.rename('st_dt'), dt_ranges)
     ['end_dt'].ge(s.to_numpy()).to_numpy()
     )

df.loc[m, 'in_range'] = True

NB. The intervals in dt_ranges should be non-overlapping.

Output:

           in_range
2022-11-08     True
2022-11-09     True
2022-11-10     True
2022-11-11      NaN
2022-11-12      NaN
2022-11-13     True
2022-11-14     True
2022-11-15     True
2022-11-16      NaN
2022-11-17      NaN
2022-11-18      NaN
  • Related