make dict by averaging values in python-CodePudding

keys = ['a', 'a' ,'a' ,'b' ,'b' ,'c']
values = [2, 4, 6, 6, 4 ,3]

Here it is guaranteed that len(keys)==len(values). You can also assume that the keys are sorted. I would like to create a dictionary where the new values will be the average of the old values. If I do

x = dict(zip(keys, values)) # {'a': 3, 'b': 4, 'c': 3}

Here the new values are not the average of the old values. I am expecting something like

{'a': 4, 'b': 5, 'c': 3}

I can do this by summing over each of the old values, dividing those by the number of corresponding key occurrences, but I think there might be a more elegant solution to this. Any ideas would be appreciated!

Edit: By average values, I meant this: b occurred twice in keys, and the values were 6 and 4. In the new dictionary, it will have the value 5.

CodePudding user response：

I think the cleanest solution would be what you suggested - grouping it by key, summing and dividing with length. I guess dataframe based solution could be quicker, but I really don't think that's enough usecase to justify additional external libraries.

from collections import defaultdict

keys = ['a', 'a' ,'a' ,'b' ,'b' ,'c']
values = [2, 4, 6, 6, 4 ,3]

groups = defaultdict(list)

for k, v in zip(keys, values):
    groups[k].append(v)

avgs = {k:sum(v)/len(v) for k, v in groups.items()}

print(avgs) # {'a': 4.0, 'b': 5.0, 'c': 3.0}

Pandas solution for reference:

import pandas

keys = ['a', 'a' ,'a' ,'b' ,'b' ,'c']
values = [2, 4, 6, 6, 4 ,3]

df = pandas.DataFrame(zip(keys, values))

print(df.groupby(0).mean())

CodePudding user response：

You can use itertools.groupby if the keys are already sorted as they are in your sample input:

from itertools import groupby
from statistics import mean
from operator import itemgetter

keys = ['a', 'a' ,'a' ,'b' ,'b' ,'c']
values = [2, 4, 6, 6, 4 ,3]

{k: mean(map(itemgetter(1), g)) for k, g in groupby(zip(keys, values), itemgetter(0))}

This returns:

{'a': 4, 'b': 5, 'c': 3}