I have a dictionary that has a string id as a key and a list of lists as a value. The first element of the list of lists is a book genre and the second element is the book rating. It looks like this:
{'id1': [['Horror', 4.0], ['Sci-Fi', 9.5], ['Horror', 9.0]],
'id2': [['Thriller', 2.3], ['Horror', 6.2], ['Thriller', 3.9]]}
What I want to do is average out the rating for each genre for each id. So in the end, I want a dictionary that is like this:
{'id1': [['Horror', 6.5], ['Sci-Fi', 9.5]],
'id2': [['Thriller', 3.1], ['Horror', 6.2]]}
What I've been trying to do is this:
#existing_dictionary is the dictionary above
dict = {}
for bookAndRate in existing_dictionary.items():
for bookGenrePlusRating in bookAndRate[1]: #bookAndRate prints out [['Comedy', 4.0], ['Comedy', 4.9], ['Adventure', 7.8]]
#bookGenrePlusRating prints out ['Comedy', 4.0], then on a separate line, ['Comedy', 4.9], then on a separate line ['Adventure', 7.8]
if bookGenrePlusRating[0] in dict.values():
dict[id[0]][1] = bookGenrePlusRating[1]
else:
dict[id[0]] = [bookGenrePlusRating[0], bookGenrePlusRating[1]]
But this just gives me the last element in each id. So I end up getting
{'id1': ['Horror', 9.0],
'id2': ['Thriller', 3.9]}
CodePudding user response:
Try:
from statistics import mean
dct = {
"id1": [["Horror", 4.0], ["Sci-Fi", 9.5], ["Horror", 9.0]],
"id2": [["Thriller", 2.3], ["Horror", 6.2], ["Thriller", 3.9]],
}
out = {}
for k, v in dct.items():
for genre, rating in v:
out.setdefault(k, {}).setdefault(genre, []).append(rating)
out = {k: [[kk, mean(vv)] for kk, vv in v.items()] for k, v in out.items()}
print(out)
Prints:
{
"id1": [["Horror", 6.5], ["Sci-Fi", 9.5]],
"id2": [["Thriller", 3.0999999999999996], ["Horror", 6.2]],
}
If you want to round the floats:
out = {
k: [[kk, round(mean(vv), 2)] for kk, vv in v.items()]
for k, v in out.items()
}
print(out)
Prints:
{
"id1": [["Horror", 6.5], ["Sci-Fi", 9.5]],
"id2": [["Thriller", 3.1], ["Horror", 6.2]],
}