I want to average the sentiment scores for some video game reviews in a csv file. I've created a dictionary for these, but want to group each result for each row in the csv together.
blob = TextBlob(review_text)
for sentence in blob.sentences:
ssp = {"Game": index, "Polarity":sentence.sentiment.polarity}
print(ssp)
This gives an output like so:
{'Game': 1, 'Polarity': -0.49687499999999996}
{'Game': 2, 'Polarity': 0.3865909090909091}
{'Game': 2, 'Polarity': 0.0}
{'Game': 2, 'Polarity': 0.2142857142857143}
{'Game': 2, 'Polarity': 0.2142857142857143}
{'Game': 2, 'Polarity': 0.04999999999999999}
{'Game': 2, 'Polarity': 0.0}
{'Game': 2, 'Polarity': -0.02500000000000005}
{'Game': 2, 'Polarity': 0.4715909090909091}
{'Game': 2, 'Polarity': 0.26666666666666666}
So how do I group by Game 1, Game 2, Game 3? I then want to average the scores for each.
Thanks
CodePudding user response:
You can put the dictionary into a list and then group it/compute average afterwards:
from statistics import mean
# your code:
blob = TextBlob(review_text)
lst = []
for sentence in blob.sentences:
ssp = {"Game": index, "Polarity": sentence.sentiment.polarity}
lst.append(ssp)
# group it:
out = {}
for dct in lst:
out.setdefault(dct["Game"], []).append(dct["Polarity"])
# compute average:
for k, v in out.items():
print("Game:", k, "Average:", mean(v))
Prints:
Game: 1 Average: -0.49687499999999996
Game: 2 Average: 0.17537999037999039
CodePudding user response:
Another possible solution, based on the previous conversion from dictionary
to pandas.DataFrame
(concatenating, successively, the several dictionaries into the dataframe):
import pandas as pd
x = [{'Game': 1, 'Polarity': -0.49687499999999996},
{'Game': 2, 'Polarity': 0.3865909090909091},
{'Game': 2, 'Polarity': 0.0},
{'Game': 2, 'Polarity': 0.2142857142857143},
{'Game': 2, 'Polarity': 0.2142857142857143},
{'Game': 2, 'Polarity': 0.04999999999999999},
{'Game': 2, 'Polarity': 0.0},
{'Game': 2, 'Polarity': -0.02500000000000005},
{'Game': 2, 'Polarity': 0.4715909090909091},
{'Game': 2, 'Polarity': 0.26666666666666666}]
df = pd.DataFrame(dtype = 'object')
for y in x:
df = pd.concat([df, pd.DataFrame([y])], ignore_index = True)
df.groupby(['Game']).mean()
Output:
Polarity
Game
1 -0.496875
2 0.175380