Home > Enterprise >  Want to group keys in a dictionary and then average each group
Want to group keys in a dictionary and then average each group

Time:08-08

I want to average the sentiment scores for some video game reviews in a csv file. I've created a dictionary for these, but want to group each result for each row in the csv together.

  blob = TextBlob(review_text) 
  for sentence in blob.sentences: 
   ssp = {"Game": index, "Polarity":sentence.sentiment.polarity}
   print(ssp)

This gives an output like so:

{'Game': 1, 'Polarity': -0.49687499999999996}
{'Game': 2, 'Polarity': 0.3865909090909091}
{'Game': 2, 'Polarity': 0.0}
{'Game': 2, 'Polarity': 0.2142857142857143}
{'Game': 2, 'Polarity': 0.2142857142857143}
{'Game': 2, 'Polarity': 0.04999999999999999}
{'Game': 2, 'Polarity': 0.0}
{'Game': 2, 'Polarity': -0.02500000000000005}
{'Game': 2, 'Polarity': 0.4715909090909091}
{'Game': 2, 'Polarity': 0.26666666666666666}

So how do I group by Game 1, Game 2, Game 3? I then want to average the scores for each.

Thanks

CodePudding user response:

You can put the dictionary into a list and then group it/compute average afterwards:

from statistics import mean

# your code:
blob = TextBlob(review_text)
lst = []
for sentence in blob.sentences:
    ssp = {"Game": index, "Polarity": sentence.sentiment.polarity}
    lst.append(ssp)

# group it:
out = {}
for dct in lst:
    out.setdefault(dct["Game"], []).append(dct["Polarity"])

# compute average:
for k, v in out.items():
    print("Game:", k, "Average:", mean(v))

Prints:

Game: 1 Average: -0.49687499999999996
Game: 2 Average: 0.17537999037999039

CodePudding user response:

Another possible solution, based on the previous conversion from dictionary to pandas.DataFrame (concatenating, successively, the several dictionaries into the dataframe):

import pandas as pd

x = [{'Game': 1, 'Polarity': -0.49687499999999996},
{'Game': 2, 'Polarity': 0.3865909090909091},
{'Game': 2, 'Polarity': 0.0},
{'Game': 2, 'Polarity': 0.2142857142857143},
{'Game': 2, 'Polarity': 0.2142857142857143},
{'Game': 2, 'Polarity': 0.04999999999999999},
{'Game': 2, 'Polarity': 0.0},
{'Game': 2, 'Polarity': -0.02500000000000005},
{'Game': 2, 'Polarity': 0.4715909090909091},
{'Game': 2, 'Polarity': 0.26666666666666666}]

df = pd.DataFrame(dtype = 'object')

for y in x:
  df = pd.concat([df, pd.DataFrame([y])], ignore_index = True)
  
df.groupby(['Game']).mean()

Output:

      Polarity
Game          
1    -0.496875
2     0.175380
  • Related