The code maybe explains the question better, but I have a list of date ranges (month start, month end for example) in a dataframe, and then a list of birthdays.
I'm simply trying to create a dataframe column that contains the birthdate days (or a list of them) if its in between a StartDate and EndDate.
Sample code:
import pandas as pd
import datetime as dt
end = [x for x in pd.date_range(dt.date(2022,1,1),periods=12, freq=pd.offsets.MonthEnd(1)).values]
start = [x for x in pd.date_range(dt.date(2022,1,1),periods=12, freq=pd.offsets.MonthBegin(1)).values]
dates = [(x, y) for x, y in zip(start, end)]
df = pd.DataFrame(dates, columns = ['StartDate', 'EndDate'])
birthdays = [dt.date(2022, 6, 17), dt.date(2022, 10, 7),dt.date(2022, 10, 12), dt.date(2022, 11 ,15)]
Expected solution (in dataframe terms! :) )
from numpy import nan
solution = [nan, nan, nan, nan, nan, [dt.date(2022, 6, 17)], nan, nan, nan, [dt.date(2022, 10, 7),dt.date(2022, 10, 12)], [dt.date(2022, 11 ,15)], nan]
df['Solution'] = solution
Is there a way to do that kind of lookup, without looping through each date in birthdays?
Thanks in advance!
CodePudding user response:
You can use a merge_asof
:
df['Solution'] = (df['StartDate'].map(
pd.merge_asof(pd.to_datetime(pd.Series(birthdays, name='Solution')),
df,
left_on='Solution', right_on='StartDate'
)
.groupby('StartDate')['Solution'].agg(list)
))
output:
StartDate EndDate Solution
0 2022-01-01 2022-01-31 NaN
1 2022-02-01 2022-02-28 NaN
2 2022-03-01 2022-03-31 NaN
3 2022-04-01 2022-04-30 NaN
4 2022-05-01 2022-05-31 NaN
5 2022-06-01 2022-06-30 [2022-06-17 00:00:00]
6 2022-07-01 2022-07-31 NaN
7 2022-08-01 2022-08-31 NaN
8 2022-09-01 2022-09-30 NaN
9 2022-10-01 2022-10-31 [2022-10-07 00:00:00, 2022-10-12 00:00:00]
10 2022-11-01 2022-11-30 [2022-11-15 00:00:00]
11 2022-12-01 2022-12-31 NaN