Home > Net >  Pandas: Identify Date ranges that contain date 'x' in list
Pandas: Identify Date ranges that contain date 'x' in list

Time:11-10

The code maybe explains the question better, but I have a list of date ranges (month start, month end for example) in a dataframe, and then a list of birthdays.

I'm simply trying to create a dataframe column that contains the birthdate days (or a list of them) if its in between a StartDate and EndDate.

Sample code:

import pandas as pd
import datetime as dt
end = [x for x in pd.date_range(dt.date(2022,1,1),periods=12, freq=pd.offsets.MonthEnd(1)).values]
start = [x for x in pd.date_range(dt.date(2022,1,1),periods=12, freq=pd.offsets.MonthBegin(1)).values]

dates = [(x, y) for x, y in zip(start, end)]

df = pd.DataFrame(dates, columns = ['StartDate', 'EndDate'])

birthdays = [dt.date(2022, 6, 17), dt.date(2022, 10, 7),dt.date(2022, 10, 12), dt.date(2022, 11 ,15)]

Expected solution (in dataframe terms! :) )

from numpy import nan
solution = [nan, nan, nan, nan, nan, [dt.date(2022, 6, 17)], nan, nan, nan, [dt.date(2022, 10, 7),dt.date(2022, 10, 12)], [dt.date(2022, 11 ,15)], nan]
df['Solution'] = solution

solution

Is there a way to do that kind of lookup, without looping through each date in birthdays?

Thanks in advance!

CodePudding user response:

You can use a merge_asof:

df['Solution'] = (df['StartDate'].map(
  pd.merge_asof(pd.to_datetime(pd.Series(birthdays, name='Solution')),
               df,
               left_on='Solution', right_on='StartDate'
              )
   .groupby('StartDate')['Solution'].agg(list)
))

output:

    StartDate    EndDate                                    Solution
0  2022-01-01 2022-01-31                                         NaN
1  2022-02-01 2022-02-28                                         NaN
2  2022-03-01 2022-03-31                                         NaN
3  2022-04-01 2022-04-30                                         NaN
4  2022-05-01 2022-05-31                                         NaN
5  2022-06-01 2022-06-30                       [2022-06-17 00:00:00]
6  2022-07-01 2022-07-31                                         NaN
7  2022-08-01 2022-08-31                                         NaN
8  2022-09-01 2022-09-30                                         NaN
9  2022-10-01 2022-10-31  [2022-10-07 00:00:00, 2022-10-12 00:00:00]
10 2022-11-01 2022-11-30                       [2022-11-15 00:00:00]
11 2022-12-01 2022-12-31                                         NaN
  • Related