I have a dictionary where the values of the dictionary are lists of dates. For each key, if, in the list of dates, there are dates within 30 days (or more generally n
days) of each other, I only want to take the last listed date.
For example, I have a dictionary as follows:
a : ['2020-12-11']
b : ['2020-05-30', '2022-09-11']
c : ['2021-07-28', '2022-07-20', '2022-07-07', '2022-06-30']
d : ['2022-04-17', '2022-04-18', '2022-04-15', '2022-08-20']
Now, since I only want to take the last listed date within the 30 day period for each value, I'm trying to get the following output:
a : ['2020-12-11']
b : ['2020-05-30', '2022-09-11']
c : ['2021-07-28', '2022-06-30']
d : ['2022-04-15', '2022-08-20']
The output value does not change for a
or b
since there are no dates within 30 days range of each other. The output value for c
changes since 2022-07-20
, 2022-07-07
, and 2022-06-30
are within 30 days of each other, and since 2022-06-30
is listed last it is kept in the list (despite 2022-07-20
being at a later date). For d
, the same thing should happen - 2022-04-17
, 2022-04-18
, 2022-04-15
are all within 30 days of each other, but 2022-04-15
is listed last so we want that, while 2022-08-20
is more than 30 days away so it's kept in too.
If anyone knows how to help me with this problem I would greatly appreciate it since I'm not sure how to approach it, especially in a computationally fast way. Moreover, how can this be generalized to make it 60 days, 100 days etc.? Thank you!
CodePudding user response:
Just loop through the dict, then loop through inner list, transform values to datetime and compare them with last value in inner list
test_dict = {
"a" : ['2020-12-11'],
"b" : ['2020-05-30', '2022-09-11'] ,
"c" : ['2021-07-28', '2022-07-20', '2022-07-07', '2022-06-30'] ,
"d" : ['2022-04-17', '2022-04-18', '2022-04-15', '2022-08-20'],
}
from datetime import datetime, timedelta
def only_leave_last_date(d: dict, days_gap:int) -> dict:
new_dict = {}
for key, value in d.items():
if not value:
new_value_str = []
else:
new_value_dt = [datetime.strptime(value[0], '%Y-%m-%d')]
new_value_str = [value[0]]
for date_str in value[1:]:
date = datetime.strptime(date_str, '%Y-%m-%d')
if abs(date - new_value_dt[-1]) <= timedelta(days=days_gap):
new_value_dt[-1] = date
new_value_str[-1] = date_str
else:
new_value_dt.append(date)
new_value_str.append(date_str)
new_dict[key] = new_value_str
return new_dict
only_leave_last_date(test_dict, days_gap=30)
Output:
{'a': ['2020-12-11'],
'b': ['2020-05-30', '2022-09-11'],
'c': ['2021-07-28', '2022-06-30'],
'd': ['2022-04-15', '2022-08-20']}