Home > Back-end >  sentiment analysis of a dataframe
sentiment analysis of a dataframe

Time:03-18

i have a project that involves determining the sentiments of a text based on the adjectives. The dataframe to be used is the adjectives column which i derived like so:

def getAdjectives(text):

    blob=TextBlob(text)
    return [ word for (word,tag) in blob.tags if tag == "JJ"]

dataset['adjectives'] = dataset['text'].apply(getAdjectives)`

I obtained the dataframe from a json file using this code:

with open('reviews.json') as project_file:    
    data = json.load(project_file)
dataset=pd.json_normalize(data) 
print(dataset.head()) 

i have done the sentiment analysis for the dataframe using this code:

dataset[['polarity', 'subjectivity']] = dataset['text'].apply(lambda text: pd.Series(TextBlob(text).sentiment))
print(dataset[['adjectives', 'polarity']])

this is the output:

                                          adjectives  polarity
0                                                 []  0.333333
1  [right, mad, full, full, iPad, iPad, bad, diff...  0.209881
2                             [stop, great, awesome]  0.633333
3                                          [awesome]  0.437143
4                        [max, high, high, Gorgeous]  0.398333
5                                     [decent, easy]  0.466667
6  [it’s, bright, wonderful, amazing, full, few...  0.265146
7                                       [same, same]  0.000000
8         [old, little, Easy, daily, that’s, late]  0.161979
9                       [few, huge, storage.If, few]  0.084762

The code has no issue except I want it to output the polarity of each adjective with the adjective, like for example right, 0.00127, mad, -0.9888 even though they are in the same row of the dataframe.

CodePudding user response:

Try this:

dataset = dataset.explode("adjectives")

Note that [] will result in a np.NaN row which you might want to remove beforehand/afterwards.

  • Related