Combine Duplicate keys in dictionary and add its values-CodePudding

I have a dictionary which is defined as following:

movies=[('x',57),('y', 23),('z', 12), ('y', 10), ('x',22),('y',12)]

It can be seen that there are duplicate values in the dictionary. Hence I combined them and the resultant dicionsry looks as shown:

{'x': [57, 22], 'y': [23,10,12], 'z': [12]}

Whenever there are two or more values represented by the key I want the final value of the final key of the final dictionary to be the average of the key values in the original dictionary.It is important to note that the average should take place whenver there are two or more values associated with a single key.

Hence the final dictionary should be as follows:

{'x': [39.5], 'y': [15], 'z': [12]}

Where:

key x=57 22/2=39.5
key y=23 10 12/3=15
Key z=12 (do note that in this case the value remains same as there is a single occourence of key z)

CodePudding user response：

traverse through the data, save the character as key and values in a list.

then again traverse to calculate the avg of the values

>>> from collections import defaultdict as dd
>>> movies=[('x',57),('y', 23),('z', 12), ('y', 10), ('x',22),('y',12)]
>>> x = dd(list)
>>> 
>>> for a, v in movies:
...     x[a].append(v)
... 
>>> for i, v in x.items():
...     x[i]=[sum(v)/len(v)]
... 
>>> x
defaultdict(<class 'list'>, {'x': [39.5], 'y': [15.0], 'z': [12.0]})
>>>

CodePudding user response：

Not functionally different to other answers but this has no reliance on additional module imports:

movies = [('x', 57), ('y', 23), ('z', 12), ('y', 10), ('x', 22), ('y', 12)]

md = dict()

for k, v in movies:
    md.setdefault(k, []).append(v)

md = {k: [sum(v) / len(v)] for k, v in md.items()}

print(md)

Output:

{'x': [39.5], 'y': [15.0], 'z': [12.0]}

Output is as required in the OP's question but surely keeping the mean in a list is unnecessary

CodePudding user response：

i do this, it's works (but maybe there is a better way)

my_dic = {'x': [57, 22], 'y': [23,10,12], 'z': [12]}
for key in my_dic:
    if len(my_dic[key]) > 1:
        my_dic[key] = [sum(my_dic[key]) / len(my_dic[key])]
    
print(my_dic)

CodePudding user response：

You can use groupby from itertools for that

from itertools import groupby
from statistics import mean

{key:[mean(item[1] for item in group)] 
for key ,group in groupby(
  sorted(movies, key=lambda x: x[0]), 
  lambda x: x[0]
)}

Output:

{'x': [39.5], 'y': [15], 'z': [12]}