i have the following code data...
import pandas as pd, numpy as np
from datetime import datetime
end_dt = datetime.today()
st_dt = (end_dt pd.DateOffset(-10)).date()
df_index = pd.date_range(st_dt, end_dt)
df = pd.DataFrame(index=df_index, columns=['in_range'])
data = [pd.to_datetime(['2022-11-08','2022-11-10']), pd.to_datetime(['2022-11-13','2022-11-15'])]
dt_ranges = pd.DataFrame(data,columns={'st_dt':'datetimens[64]', 'end_dt': 'datetimens[64]'})
this produces the following two dataframes:
df:
in_range
2022-11-08 NaN
2022-11-09 NaN
2022-11-10 NaN
2022-11-11 NaN
2022-11-12 NaN
2022-11-13 NaN
2022-11-14 NaN
2022-11-15 NaN
2022-11-16 NaN
2022-11-17 NaN
2022-11-18 NaN
and date_ranges:
st_dt end_dt
0 2022-11-08 2022-11-10
1 2022-11-13 2022-11-15
I would like to update the 'in_range' column to indicate if the index falls within any of the pairs of start and end dates of the 2nd dataframe. so i should end up with this:
in_range
2022-11-08 True
2022-11-09 True
2022-11-10 True
2022-11-11 NaN
2022-11-12 NaN
2022-11-13 True
2022-11-14 True
2022-11-15 True
2022-11-16 NaN
2022-11-17 NaN
2022-11-18 NaN
I've gone down the path of trying to do this with using lambda and iteration. but to me that seems in efficient.
def in_range(index_date, date_ranges):
for r in date_ranges.values:
if (r[0] >= index_date) & (r[1] <= index_date):
return True
return False
df['in_range'] = df.reset_index().apply(lambda x: in_range(x.date, dt_ranges), axis=1)
the above sets in_range to NaNs always, despite the code returning the correct values. i suspect it's because i am resetting the index and so it can not align. Also, as mentioned - this solution probably is pretty inefficient
is there a more pythonic/pandemic way of doing this?
CodePudding user response:
Use merge_asof
and boolean indexing:
s = df.index.to_series()
m = (pd.merge_asof(s.rename('st_dt'), dt_ranges)
['end_dt'].ge(s.to_numpy()).to_numpy()
)
df.loc[m, 'in_range'] = True
NB. The intervals in dt_ranges
should be non-overlapping.
Output:
in_range
2022-11-08 True
2022-11-09 True
2022-11-10 True
2022-11-11 NaN
2022-11-12 NaN
2022-11-13 True
2022-11-14 True
2022-11-15 True
2022-11-16 NaN
2022-11-17 NaN
2022-11-18 NaN