Home > Back-end >  Find minimum date that's greater than dates within a column
Find minimum date that's greater than dates within a column

Time:10-12

I have a dataframe of stating date and another list of dates. For each date within start_date df, I want the find the minimum date that's greater than it from the date_list df.

Desired output shown below.

start_date=pd.DataFrame( ['2014-04-05',
        '2014-04-06',
        '2014-04-07',
        '2014-04-08',
        '2014-04-09'],columns=['start_date'])

start_date['start_date']=pd.to_datetime(start_date['start_date'])

date_list=pd.DataFrame( [
        '2014-04-06',
        '2014-04-07',
        '2014-04-08',
        '2014-04-09','2014-04-20'],columns=['date_list'])
date_list['date_list']=pd.to_datetime(date_list['date_list'])

desired_output= pd.DataFrame( {'start_date':['2014-04-05',
        '2014-04-06',
        '2014-04-07',
        '2014-04-08',
        '2014-04-09'],
        'end_date':['2014-04-06',
        '2014-04-07',
        '2014-04-08',
        '2014-04-09','2014-04-20']})

CodePudding user response:

One way is to just take all the dates that are larger than start date, then take the minimum of those:

out=start_date.copy()
out['end_date']=out['start_date'].apply(lambda x: date_list[date_list['date_list'].gt(x), 'date_list'].min())

OUTPUT

  start_date   end_date
0 2014-04-05 2014-04-06
1 2014-04-06 2014-04-07
2 2014-04-07 2014-04-08
3 2014-04-08 2014-04-09
4 2014-04-09 2014-04-20

For convenience, you can just use the underlying numpy array or create numpy array for the date list, then apply masking minimum:

arr = date_list['date_list'].values
out['end_date']=out['start_date'].apply(lambda x: arr[arr>x].min())
  • Related