I have a dictionary where the values of the dictionary are lists of dates. For each key, if, in the list of dates, there are dates within one month of each other, I only want to take the last listed date within the one month period.
For example, I have a dictionary as follows:
a : ['2020-12-11']
b : ['2020-05-30', '2022-09-11']
c : ['2021-07-28', '2022-07-20', '2022-07-07']
d : ['2022-04-17', '2022-04-18', '2022-04-15', '2022-08-20']
Now, since I only want to take the last listed date with the one month period for each value, I'm trying to get the following output:
a : ['2020-12-11']
b : ['2020-05-30', '2022-09-11']
c : ['2021-07-28', '2022-07-07']
d : ['2022-04-15', '2022-08-20']
The output value does not change for a
or b
since there are no dates within a one month range of each other. The output value for c
changes since 2022-07-20
and 2022-07-07
are within one month of each other, and since 2022-07-07
is listed last it is kept in the list (despite 2022-07-20
being at a later date). For d
, the same thing should happen - 2022-04-17
, 2022-04-18
, 2022-04-15
are all within 1 month of each other, but 2022-04-15
is listed last so we want that, while 2022-08-20
is more than a month away so it's kept in too.
If anyone knows how to help me with this problem I would greatly appreciate it, since I'm not sure how to approach this problem, especially in a computationally fast way. Thank you!
CodePudding user response:
Put the dates in dicts where the key is the month (i.e. the first seven characters of the string), then pull the values back out of the dicts to make the new lists.
>>> dates = {
... 'a' : ['2020-12-11'],
... 'b' : ['2020-05-30', '2022-09-11'],
... 'c' : ['2021-07-28', '2022-07-20', '2022-07-07'],
... 'd' : ['2022-04-17', '2022-04-18', '2022-04-15', '2022-08-20'],
... }
>>> from pprint import pprint
>>> pprint({k: list({d[:7]: d for d in v}.values()) for k, v in dates.items()})
{'a': ['2020-12-11'],
'b': ['2020-05-30', '2022-09-11'],
'c': ['2021-07-28', '2022-07-07'],
'd': ['2022-04-15', '2022-08-20']}
Conveniently, shoving a list of colliding items into a dict leaves you with the last item, which is exactly what you want; if you didn't want that, you could sort each list before putting it into the dict.
CodePudding user response:
You can use groupby from itertools:
from itertools import groupby
input_dict = {
"a" : ['2020-12-11'],
"b" : ['2020-05-30', '2022-09-11'],
"c" : ['2021-07-28', '2022-07-20', '2022-07-07'],
"d" : ['2022-04-17', '2022-04-18', '2022-04-15', '2022-08-20'],
}
new_dict = {k: [list(g)[-1] for _, g in groupby(v, key=lambda x: x[:7])] for k, v in input_dict.items()}
New dict is equal to:
{'a': ['2020-12-11'],
'b': ['2020-05-30', '2022-09-11'],
'c': ['2021-07-28', '2022-07-07'],
'd': ['2022-04-15', '2022-08-20']}