i have a project that involves determining the sentiments of a text based on the adjectives. The dataframe to be used is the adjectives column which i derived like so:
def getAdjectives(text):
blob=TextBlob(text)
return [ word for (word,tag) in blob.tags if tag == "JJ"]
dataset['adjectives'] = dataset['text'].apply(getAdjectives)`
I obtained the dataframe from a json file using this code:
with open('reviews.json') as project_file:
data = json.load(project_file)
dataset=pd.json_normalize(data)
print(dataset.head())
i have done the sentiment analysis for the dataframe using this code:
dataset[['polarity', 'subjectivity']] = dataset['text'].apply(lambda text: pd.Series(TextBlob(text).sentiment))
print(dataset[['adjectives', 'polarity']])
this is the output:
adjectives polarity
0 [] 0.333333
1 [right, mad, full, full, iPad, iPad, bad, diff... 0.209881
2 [stop, great, awesome] 0.633333
3 [awesome] 0.437143
4 [max, high, high, Gorgeous] 0.398333
5 [decent, easy] 0.466667
6 [it’s, bright, wonderful, amazing, full, few... 0.265146
7 [same, same] 0.000000
8 [old, little, Easy, daily, that’s, late] 0.161979
9 [few, huge, storage.If, few] 0.084762
The code has no issue except I want it to output the polarity of each adjective with the adjective, like for example right, 0.00127, mad, -0.9888 even though they are in the same row of the dataframe.
CodePudding user response:
Try this:
dataset = dataset.explode("adjectives")
Note that []
will result in a np.NaN
row which you might want to remove beforehand/afterwards.