Home > Enterprise >  Find average of certain values in dictionary for each key in dictionary
Find average of certain values in dictionary for each key in dictionary

Time:10-01

I have a dictionary that has a string id as a key and a list of lists as a value. The first element of the list of lists is a book genre and the second element is the book rating. It looks like this:

{'id1': [['Horror', 4.0], ['Sci-Fi', 9.5], ['Horror', 9.0]],
'id2': [['Thriller', 2.3], ['Horror', 6.2], ['Thriller', 3.9]]}

What I want to do is average out the rating for each genre for each id. So in the end, I want a dictionary that is like this:

{'id1': [['Horror', 6.5], ['Sci-Fi', 9.5]],
'id2': [['Thriller', 3.1], ['Horror', 6.2]]}

What I've been trying to do is this:

#existing_dictionary is the dictionary above
dict = {}
 for bookAndRate in existing_dictionary.items(): 
      for bookGenrePlusRating in bookAndRate[1]: #bookAndRate prints out [['Comedy', 4.0], ['Comedy', 4.9], ['Adventure', 7.8]] 
            #bookGenrePlusRating prints out ['Comedy', 4.0], then on a separate line, ['Comedy', 4.9], then on a separate line ['Adventure', 7.8]
            if bookGenrePlusRating[0] in dict.values():
                dict[id[0]][1]  = bookGenrePlusRating[1] 
            else:
                 dict[id[0]] = [bookGenrePlusRating[0], bookGenrePlusRating[1]]

But this just gives me the last element in each id. So I end up getting

{'id1': ['Horror', 9.0],
'id2': ['Thriller', 3.9]}

CodePudding user response:

Try:

from statistics import mean

dct = {
    "id1": [["Horror", 4.0], ["Sci-Fi", 9.5], ["Horror", 9.0]],
    "id2": [["Thriller", 2.3], ["Horror", 6.2], ["Thriller", 3.9]],
}

out = {}
for k, v in dct.items():
    for genre, rating in v:
        out.setdefault(k, {}).setdefault(genre, []).append(rating)

out = {k: [[kk, mean(vv)] for kk, vv in v.items()] for k, v in out.items()}
print(out)

Prints:

{
    "id1": [["Horror", 6.5], ["Sci-Fi", 9.5]],
    "id2": [["Thriller", 3.0999999999999996], ["Horror", 6.2]],
}

If you want to round the floats:

out = {
    k: [[kk, round(mean(vv), 2)] for kk, vv in v.items()]
    for k, v in out.items()
}
print(out)

Prints:

{
    "id1": [["Horror", 6.5], ["Sci-Fi", 9.5]],
    "id2": [["Thriller", 3.1], ["Horror", 6.2]],
}
  • Related