I am using the following code to search through dates, in order to find the nearest previous date in a list of dates :
def nearest_previous_date(list_of_dates, pivot_date):
""" Helper function to find the nearest previous date in a list of dates
Args:
list_of_dates (list): list of datetime objects
pivot_date (datetime): reference date
Returns:
(datetime): datetime immediately before or equal to reference date, if none satisfy criteria returns
first date in list
"""
return min(list_of_dates, key=lambda x: (pivot_date - x).days if x <= pivot_date else float("inf"))
I need to call this function quite a lot so I would like it to be as efficient as possible and at the moment it is taking something like 200 microseconds to search through a list of 23 dates and find the relevant one. It does not sound like a lot but this does not scale well. Is there a way to make this function more efficient?
Here is an example
pivot_date = datetime(day=21, month=7, year=2019)
list_of_dates
DatetimeIndex(['2015-06-30', '2015-09-30', '2015-12-31', '2016-03-31',
'2016-06-30', '2016-09-30', '2016-12-30', '2017-03-31',
'2017-06-30', '2017-09-29', '2017-12-29', '2018-03-30',
'2018-06-30', '2018-10-01', '2019-01-01', '2019-03-29',
'2019-07-01', '2019-10-01', '2019-12-31', '2020-03-31',
'2020-06-30', '2020-09-30', '2020-12-31'],
dtype='datetime64[ns]', name='effectiveDate', freq=None)
%%timeit
min(list_of_dates, key=lambda x: (pivot_date - x).days if x <= pivot_date else float("inf"))
191 µs ± 5.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
CodePudding user response:
Solution proposed by @jasonharper
def nearest_previous_date_NEW(list_of_dates, pivot_date):
""" Helper function to find the nearest previous date in a list of dates
Important: assumes list_of_dates is sorted ascending
Args:
list_of_dates (list): list of datetime objects
pivot_date (datetime): reference date
Returns:
(datetime): datetime immediately before or equal to reference date, if
none satisfy criteria returns first date in list
"""
return list_of_dates[max(0, bisect.bisect_left(list_of_dates, pivot_date)-1)]
Much faster indeed:
47.4 µs ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
CodePudding user response:
Since datetime
objects can be ordered, the nearest date before the reference date is indeed the "maximum" date preceding the reference date:
def nearest_previous_date(list_of_dates, pivot_date):
return max((date for date in list_of_dates if date <= pivot_date), default=list_of_dates[0])
If the list is assumed to be sorted, then one can adopt binary search, which is more scalable:
from bisect import bisect
def nearest_previous_date(list_of_dates, pivot_date):
return list_of_dates[max(bisect(list_of_dates, pivot_date) - 1, 0)]