I have a dictionary that has key as string and value a number, like this
d = {'key1': 0.5, 'key2': 0.2, 'key3': 0.3, 'key4': 0.9, 'key5': 0.94, ...}
What I would like to do is
- bin the values (0.5, 0.2, ....) based on a fixed interval, say every 0.2 increment
- produce another dictionary that allows me to look up the bin that a key resides in
Namely, the final dictionary should look like
d = {'key1': 3, 'key2': 1, 'key3': 2, 'key4': 5, 'key5': 5, ...}
The dictionary is very big, probably over 500k entries... what is the most efficient way of doing this?
Thanks
CodePudding user response:
I hope I've understood your question right:
d = {"key1": 0.5, "key2": 0.2, "key3": 0.3, "key4": 0.9, "key5": 0.94}
d = {k: int((v - 0.001) // 0.2) 1 for k, v in d.items()}
bins = [
f"{round(0.2*i, 2)}-{round(0.2*(i 1),2)}" for i in range(max(d.values()))
]
print(d)
print(bins)
Prints the modified dictionary and the bins (1-based):
{'key1': 3, 'key2': 1, 'key3': 2, 'key4': 5, 'key5': 5}
['0.0-0.2', '0.2-0.4', '0.4-0.6', '0.6-0.8', '0.8-1.0']