I have a dataframe of stating date and another list of dates. For each date within start_date df, I want the find the minimum date that's greater than it from the date_list df.
Desired output shown below.
start_date=pd.DataFrame( ['2014-04-05',
'2014-04-06',
'2014-04-07',
'2014-04-08',
'2014-04-09'],columns=['start_date'])
start_date['start_date']=pd.to_datetime(start_date['start_date'])
date_list=pd.DataFrame( [
'2014-04-06',
'2014-04-07',
'2014-04-08',
'2014-04-09','2014-04-20'],columns=['date_list'])
date_list['date_list']=pd.to_datetime(date_list['date_list'])
desired_output= pd.DataFrame( {'start_date':['2014-04-05',
'2014-04-06',
'2014-04-07',
'2014-04-08',
'2014-04-09'],
'end_date':['2014-04-06',
'2014-04-07',
'2014-04-08',
'2014-04-09','2014-04-20']})
CodePudding user response:
One way is to just take all the dates that are larger than start date, then take the minimum of those:
out=start_date.copy()
out['end_date']=out['start_date'].apply(lambda x: date_list[date_list['date_list'].gt(x), 'date_list'].min())
OUTPUT
start_date end_date
0 2014-04-05 2014-04-06
1 2014-04-06 2014-04-07
2 2014-04-07 2014-04-08
3 2014-04-08 2014-04-09
4 2014-04-09 2014-04-20
For convenience, you can just use the underlying numpy array or create numpy array for the date list, then apply masking minimum:
arr = date_list['date_list'].values
out['end_date']=out['start_date'].apply(lambda x: arr[arr>x].min())