Home > Software engineering >  min function with key parameter to search through dates - is there a faster way?
min function with key parameter to search through dates - is there a faster way?

Time:12-15

I am using the following code to search through dates, in order to find the nearest previous date in a list of dates :

def nearest_previous_date(list_of_dates, pivot_date):
    """ Helper function to find the nearest previous date in a list of dates

    Args:
        list_of_dates (list): list of datetime objects
        pivot_date (datetime): reference date
    
    Returns:
        (datetime): datetime immediately before or equal to reference date, if none satisfy criteria returns
        first date in list
    """
    return min(list_of_dates, key=lambda x: (pivot_date - x).days if x <= pivot_date else float("inf"))

I need to call this function quite a lot so I would like it to be as efficient as possible and at the moment it is taking something like 200 microseconds to search through a list of 23 dates and find the relevant one. It does not sound like a lot but this does not scale well. Is there a way to make this function more efficient?

Here is an example

pivot_date = datetime(day=21, month=7, year=2019)
list_of_dates
DatetimeIndex(['2015-06-30', '2015-09-30', '2015-12-31', '2016-03-31',
           '2016-06-30', '2016-09-30', '2016-12-30', '2017-03-31',
           '2017-06-30', '2017-09-29', '2017-12-29', '2018-03-30',
           '2018-06-30', '2018-10-01', '2019-01-01', '2019-03-29',
           '2019-07-01', '2019-10-01', '2019-12-31', '2020-03-31',
           '2020-06-30', '2020-09-30', '2020-12-31'],
          dtype='datetime64[ns]', name='effectiveDate', freq=None)

%%timeit
min(list_of_dates, key=lambda x: (pivot_date - x).days if x <= pivot_date else float("inf"))

191 µs ± 5.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

CodePudding user response:

Solution proposed by @jasonharper

def nearest_previous_date_NEW(list_of_dates, pivot_date):
    """ Helper function to find the nearest previous date in a list of dates
    Important: assumes list_of_dates is sorted ascending

    Args:
       list_of_dates (list): list of datetime objects
       pivot_date (datetime): reference date
    
    Returns:
        (datetime): datetime immediately before or equal to reference date, if 
        none satisfy criteria returns first date in list
    """
    return list_of_dates[max(0, bisect.bisect_left(list_of_dates, pivot_date)-1)]

Much faster indeed:

47.4 µs ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

CodePudding user response:

Since datetime objects can be ordered, the nearest date before the reference date is indeed the "maximum" date preceding the reference date:

def nearest_previous_date(list_of_dates, pivot_date):
    return max((date for date in list_of_dates if date <= pivot_date), default=list_of_dates[0])

If the list is assumed to be sorted, then one can adopt binary search, which is more scalable:

from bisect import bisect

def nearest_previous_date(list_of_dates, pivot_date):
    return list_of_dates[max(bisect(list_of_dates, pivot_date) - 1, 0)]
  • Related