I want to delet items from a dict where the value is a number and the key a datetime: Conditions: if the value is the same i want to delete all items with the same value within 1h
#del duplicates within 60min
for formatted_key in list(formatted_dict.keys()):
for temp_key in list(formatted_dict.keys()):
print("formatted_key:", formatted_key)
print("temp_key:", temp_key)
if(formatted_dict[formatted_key]==formatted_dict[temp_key]):
if(temp_key!=formatted_key):
td=timedelta(minutes = 60)
new_key=formatted_key td
if (new_key>temp_key):
del formatted_dict[temp_key]
print("key to delet:", temp_key)
for key,value in formatted_dict.items():
print(key,value)
the output i get till the error:
**test data dict:**
2022-10-25 08:14:08.820000 var301533
2022-10-25 08:16:12.286000 var301533
2022-10-25 08:17:05.067000 var003907
2022-10-25 08:19:04.422000 var003907
2022-10-25 08:20:05.021000 var301504
2022-10-25 08:23:04.526000 var301504
2022-10-25 08:23:14.204000 var301504
**the for loops:**
formatted_key: 2022-10-25 08:14:08.820000
temp_key: 2022-10-25 08:14:08.820000
formatted_key: 2022-10-25 08:14:08.820000
temp_key: 2022-10-25 08:16:12.286000
key to delet: 2022-10-25 08:16:12.286000
**key and value gets deleted**
2022-10-25 08:14:08.820000 var301533
2022-10-25 08:17:05.067000 var003907
2022-10-25 08:19:04.422000 var003907
2022-10-25 08:20:05.021000 var301504
2022-10-25 08:23:04.526000 var301504
2022-10-25 08:23:14.204000 var301504
formatted_key: 2022-10-25 08:14:08.820000
temp_key: 2022-10-25 08:17:05.067000
formatted_key: 2022-10-25 08:14:08.820000
temp_key: 2022-10-25 08:19:04.422000
formatted_key: 2022-10-25 08:14:08.820000
temp_key: 2022-10-25 08:20:05.021000
formatted_key: 2022-10-25 08:14:08.820000
temp_key: 2022-10-25 08:23:04.526000
formatted_key: 2022-10-25 08:14:08.820000
temp_key: 2022-10-25 08:23:14.204000
formatted_key: 2022-10-25 08:16:12.286000
temp_key: 2022-10-25 08:14:08.820000
the right key/value gets deleted which you can see in the output but the problem is that the outer for-loop still gets the deleted key but dont find the deleted value.
key error but i dont have a solution to the problem:
if(formatted_dict[formatted_key]==formatted_dict[temp_key]):
KeyError: datetime.datetime(2022, 10, 25, 8, 16, 12, 286000)
Expected output:
**final dict:**
2022-10-25 08:14:08.820000 var301533
2022-10-25 08:17:05.067000 var003907
2022-10-25 08:20:05.021000 var301504
CodePudding user response:
A not recommended change (better rewrite your code using another approach) in order to avoid a KeyError would be to use:
if( formatted_key in formatted_dict and temp_key in formatted_dict and formatted_dict[formatted_key]==formatted_dict[temp_key]):
and
if temp_key in formatted_dict: del formatted_dict[temp_key]
(another approach to avoid dictionary KeyError can be usage of Pythons try: ... except ...:
construct)
CodePudding user response:
One approach would be to look at the "unique" values and create a new dictionary with those instead.
from datetime import datetime
def formatTime(strDt: str) -> datetime:
return datetime.strptime(strDt, "%Y-%m-%d %H:%M:%S.%f")
formatted_dict = {
"2022-10-25 08:14:08.820000": "var301533",
"2022-10-25 08:16:12.286000": "var301533",
"2022-10-25 08:17:05.067000": "var003907",
"2022-10-25 08:19:04.422000": "var003907",
"2022-10-25 08:20:05.021000": "var301504",
"2022-10-25 09:23:04.526000": "var301504",
"2022-10-25 09:23:14.204000": "var301504",
"2022-10-25 10:23:04.526000": "var301504",
"2022-10-25 10:23:14.204000": "var301504"
}
# create a new dictionary for the "unique" values,
# and add whatever is the first value from the initial dictionary as our starting point
new_dict = {
list(formatted_dict)[0]: formatted_dict[list(formatted_dict)[0]],
}
# iterate to keep the "uniques" every 60 mins
for key in formatted_dict:
# we look at the last value we added
if (formatTime(key) - formatTime(list(new_dict)[-1])).total_seconds() > 3600:
new_dict[key] = formatted_dict[key]
print(new_dict)
If your dictionary already contains the datetimes formatted, then please remove the formatTime
bit to fit your needs.
Hope this helps.
CodePudding user response:
Start by creating a new dictionary which is keyed on the values from the input dictionary. The values in the new dictionary are a list of date/time strings along with a datetime object so that we can perform arithmetic with them.
If the date/time keys in the input dictionary are not in ascending order then we need to sort them.
Next we enumerate the new dictionary carrying out comparisons to determine which keys from the original dictionary need to be deleted.
Import of the json module is only for presentation purposes.
from datetime import datetime, timedelta
import json
formatted_dict = {
'2022-10-25 08:14:08.820000': 'var301533',
'2022-10-25 08:16:12.286000': 'var301533',
'2022-10-25 08:17:05.067000': 'var003907',
'2022-10-25 08:19:04.422000': 'var003907',
'2022-10-25 08:20:05.021000': 'var301504',
'2022-10-25 08:23:04.526000': 'var301504',
'2022-10-25 08:23:14.204000': 'var301504'
}
dict_by_value = {}
for k, v in formatted_dict.items():
dt = datetime.strptime(k, '%Y-%m-%d %H:%M:%S.%f')
dict_by_value.setdefault(v, []).append((k, dt))
for v in dict_by_value.values():
v.sort()
for k, v in dict_by_value.items():
for (_, t1), (d2, t2) in zip(v, v[1:]):
if t2 - t1 < timedelta(hours=1):
del formatted_dict[d2]
print(json.dumps(formatted_dict, indent=2))
Output:
{
"2022-10-25 08:14:08.820000": "var301533",
"2022-10-25 08:17:05.067000": "var003907",
"2022-10-25 08:20:05.021000": "var301504"
}