I have the following dataframe with sentiments:
Text | Negative | Neutral | Positive |
---|---|---|---|
I lost my phone. I am sad | 0.8 | 0.15 | 0.05 |
How is your day? | 0.1 | 0.8 | 0.1 |
Let's go out for dinner today. | 0.06 | 0.55 | 0.39 |
I am super pissed at my friend for cancelling the party. | 0.73 | 0.11 | 0.16 |
I am so happy I want to dance | 0 | 0.1 | 0.9 |
I am not sure if I should laugh or just smile | 0.08 | 0.24 | 0.68 |
This is based on the sentimental analysis I have completed. Now, each text can be tagged as any one of the 5:
Very Negative, Negative, Neutral, Positive, Very Positive.
I want to add a new column in the dataframe that analyses the sentiments and tags as per the following rule:
1. If the value of negative or positive is most dominating and >= 0.8 (80%) then mark it as very negative or very positive.
2. If the value of negative or positive is most dominating but it is >= 0.5 but less than 0.8 then just negative or positive.
3. If the value of neutral is >= 0.5 then Neutral. There is no such thing as Very Neutral.
For the above example, the result should look like below:
Text | Negative | Neutral | Positive | Sentiment |
---|---|---|---|---|
I lost my phone. I am sad | 0.8 | 0.15 | 0.05 | Very Negative |
How is your day? | 0.1 | 0.8 | 0.1 | Neutral |
Let's go out for dinner today. | 0.06 | 0.55 | 0.39 | Neutral |
I am super pissed at my friend for cancelling the party. | 0.73 | 0.11 | 0.16 | Negative |
I am so happy I want to dance | 0 | 0.1 | 0.9 | Very Positive |
I am not sure if I should laugh or just smile | 0.08 | 0.24 | 0.68 | Positive |
How can I perform this operation in dataframe. I want to then plot a graph to see the distribution of each of those 5 sentiments. That part I can do, but I am trying to get this multiple conditions working on pandas.
Any help is greatly appreciated.
CodePudding user response:
You can use np.select()
conditions = [df['Positive']>=0.80, df['Negative']>=0.80, ((df['Positive']>=0.50) & (df['Positive']<0.80)),
((df['Negative']>=0.50) & (df['Negative']<0.80)), df['Neutral']>=0.5]
values = ['Very Positive', 'Very Negative', 'Positive', 'Negative', 'Neutral']
df['Sentiment'] = np.select(conditions, values, default=np.nan)
OUTPUT
Text Negative Neutral Positive Sentiment
0 I lost my phone. I am sad 0.80 0.15 0.05 Very Negative
1 How is your day? 0.10 0.80 0.10 Neutral
2 Let's go out for dinner today. 0.06 0.55 0.39 Neutral
3 I am super pissed at my friend for cancelling ... 0.73 0.11 0.16 Negative
4 I am so happy I want to dance 0.00 0.10 0.90 Very Positive
5 I am not sure if I should laugh or just smile 0.08 0.24 0.68 Positive
CodePudding user response:
You can create a function that map from the three values into sentiment then use the apply
method to apply the function for each row in the tables. It should generate one column (series). Then, append the series to the main table.
In crude (not yet tested code) it should looks something like this
def your_fn(values):
pos = values["Positive"]
neu = values["Neutral"]
neg = values["Negative"]
# 1. If the value of negative or positive is most dominating and >= 0.8 (80%) then mark it as very negative or very positive.
if (pos >= .8):
return "Very positive"
if (neg >= .8):
return "Very negative"
# 2. If the value of negative or positive is most dominating but it is >= 0.5 but less than 0.8 then just negative or positive.
if (pos >= .5):
return "Positive"
if (neg >= .5):
return "Negative"
# 3. If the value of neutral is >= 0.5 then Neutral. There is no such thing as Very Neutral.
if (neu >= .5):
return "Neutral"
return "-"
df['Sentiment'] = df.apply(your_fn, axis=1)