keys = ['a', 'a' ,'a' ,'b' ,'b' ,'c']
values = [2, 4, 6, 6, 4 ,3]
Here it is guaranteed that len(keys)==len(values)
. You can also assume that the keys are sorted. I would like to create a dictionary where the new values will be the average of the old values. If I do
x = dict(zip(keys, values)) # {'a': 3, 'b': 4, 'c': 3}
Here the new values are not the average of the old values. I am expecting something like
{'a': 4, 'b': 5, 'c': 3}
I can do this by summing over each of the old values, dividing those by the number of corresponding key occurrences, but I think there might be a more elegant solution to this. Any ideas would be appreciated!
Edit: By average values, I meant this: b
occurred twice in keys
, and the values were 6
and 4
. In the new dictionary, it will have the value 5
.
CodePudding user response:
I think the cleanest solution would be what you suggested - grouping it by key, summing and dividing with length. I guess dataframe based solution could be quicker, but I really don't think that's enough usecase to justify additional external libraries.
from collections import defaultdict
keys = ['a', 'a' ,'a' ,'b' ,'b' ,'c']
values = [2, 4, 6, 6, 4 ,3]
groups = defaultdict(list)
for k, v in zip(keys, values):
groups[k].append(v)
avgs = {k:sum(v)/len(v) for k, v in groups.items()}
print(avgs) # {'a': 4.0, 'b': 5.0, 'c': 3.0}
Pandas solution for reference:
import pandas
keys = ['a', 'a' ,'a' ,'b' ,'b' ,'c']
values = [2, 4, 6, 6, 4 ,3]
df = pandas.DataFrame(zip(keys, values))
print(df.groupby(0).mean())
CodePudding user response:
You can use itertools.groupby
if the keys are already sorted as they are in your sample input:
from itertools import groupby
from statistics import mean
from operator import itemgetter
keys = ['a', 'a' ,'a' ,'b' ,'b' ,'c']
values = [2, 4, 6, 6, 4 ,3]
{k: mean(map(itemgetter(1), g)) for k, g in groupby(zip(keys, values), itemgetter(0))}
This returns:
{'a': 4, 'b': 5, 'c': 3}