Home > other >  Removing dates within 'n' days of each other in Dictionaries
Removing dates within 'n' days of each other in Dictionaries

Time:11-01

I have a dictionary where the values of the dictionary are lists of dates. For each key, if, in the list of dates, there are dates within 30 days (or more generally n days) of each other, I only want to take the last listed date.

For example, I have a dictionary as follows:

a : ['2020-12-11'] 
b : ['2020-05-30', '2022-09-11'] 
c : ['2021-07-28', '2022-07-20', '2022-07-07', '2022-06-30'] 
d : ['2022-04-17', '2022-04-18', '2022-04-15', '2022-08-20']

Now, since I only want to take the last listed date within the 30 day period for each value, I'm trying to get the following output:

a : ['2020-12-11'] 
b : ['2020-05-30', '2022-09-11'] 
c : ['2021-07-28', '2022-06-30'] 
d : ['2022-04-15', '2022-08-20']

The output value does not change for a or b since there are no dates within 30 days range of each other. The output value for c changes since 2022-07-20, 2022-07-07, and 2022-06-30 are within 30 days of each other, and since 2022-06-30 is listed last it is kept in the list (despite 2022-07-20 being at a later date). For d, the same thing should happen - 2022-04-17, 2022-04-18, 2022-04-15 are all within 30 days of each other, but 2022-04-15 is listed last so we want that, while 2022-08-20 is more than 30 days away so it's kept in too.

If anyone knows how to help me with this problem I would greatly appreciate it since I'm not sure how to approach it, especially in a computationally fast way. Moreover, how can this be generalized to make it 60 days, 100 days etc.? Thank you!

CodePudding user response:

Just loop through the dict, then loop through inner list, transform values to datetime and compare them with last value in inner list

test_dict = {
    "a" : ['2020-12-11'],
    "b" : ['2020-05-30', '2022-09-11'] ,
    "c" : ['2021-07-28', '2022-07-20', '2022-07-07', '2022-06-30'] ,
    "d" : ['2022-04-17', '2022-04-18', '2022-04-15', '2022-08-20'],
}


from datetime import datetime, timedelta
def only_leave_last_date(d: dict, days_gap:int) -> dict:
    new_dict = {}
    for key, value in d.items():
        if not value:            
            new_value_str = []
        else:
            new_value_dt = [datetime.strptime(value[0], '%Y-%m-%d')]
            new_value_str = [value[0]]
            for date_str in value[1:]:
                date = datetime.strptime(date_str, '%Y-%m-%d')
                if abs(date - new_value_dt[-1]) <= timedelta(days=days_gap):
                    new_value_dt[-1] = date
                    new_value_str[-1] = date_str
                else:
                    new_value_dt.append(date)
                    new_value_str.append(date_str)
        new_dict[key] = new_value_str
    return new_dict

only_leave_last_date(test_dict, days_gap=30)

Output:

{'a': ['2020-12-11'],
 'b': ['2020-05-30', '2022-09-11'],
 'c': ['2021-07-28', '2022-06-30'],
 'd': ['2022-04-15', '2022-08-20']}
  • Related