Add new column in dataframe based on multiple column conditions-CodePudding

I have the following dataframe with sentiments:

Text	Negative	Neutral	Positive
I lost my phone. I am sad	0.8	0.15	0.05
How is your day?	0.1	0.8	0.1
Let's go out for dinner today.	0.06	0.55	0.39
I am super pissed at my friend for cancelling the party.	0.73	0.11	0.16
I am so happy I want to dance	0	0.1	0.9
I am not sure if I should laugh or just smile	0.08	0.24	0.68

This is based on the sentimental analysis I have completed. Now, each text can be tagged as any one of the 5:

Very Negative, Negative, Neutral, Positive, Very Positive.

I want to add a new column in the dataframe that analyses the sentiments and tags as per the following rule:

1. If the value of negative or positive is most dominating and >= 0.8 (80%) then mark it as very negative or very positive.

2. If the value of negative or positive is most dominating but it is >= 0.5 but less than 0.8 then just negative or positive.

3. If the value of neutral is >= 0.5 then Neutral. There is no such thing as Very Neutral.

For the above example, the result should look like below:

Text	Negative	Neutral	Positive	Sentiment
I lost my phone. I am sad	0.8	0.15	0.05	Very Negative
How is your day?	0.1	0.8	0.1	Neutral
Let's go out for dinner today.	0.06	0.55	0.39	Neutral
I am super pissed at my friend for cancelling the party.	0.73	0.11	0.16	Negative
I am so happy I want to dance	0	0.1	0.9	Very Positive
I am not sure if I should laugh or just smile	0.08	0.24	0.68	Positive

How can I perform this operation in dataframe. I want to then plot a graph to see the distribution of each of those 5 sentiments. That part I can do, but I am trying to get this multiple conditions working on pandas.

Any help is greatly appreciated.

CodePudding user response：

You can use np.select()

conditions = [df['Positive']>=0.80, df['Negative']>=0.80, ((df['Positive']>=0.50) & (df['Positive']<0.80)),
              ((df['Negative']>=0.50) & (df['Negative']<0.80)), df['Neutral']>=0.5]
values = ['Very Positive', 'Very Negative', 'Positive', 'Negative', 'Neutral']
df['Sentiment'] = np.select(conditions, values, default=np.nan)

OUTPUT

                                               Text  Negative  Neutral  Positive      Sentiment
0                          I lost my phone. I am sad      0.80     0.15      0.05  Very Negative
1                                   How is your day?      0.10     0.80      0.10        Neutral
2                     Let's go out for dinner today.      0.06     0.55      0.39        Neutral
3  I am super pissed at my friend for cancelling ...      0.73     0.11      0.16       Negative
4                     I am so happy  I want to dance      0.00     0.10      0.90  Very Positive
5      I am not sure if I should laugh or just smile      0.08     0.24      0.68       Positive

CodePudding user response：

You can create a function that map from the three values into sentiment then use the apply method to apply the function for each row in the tables. It should generate one column (series). Then, append the series to the main table.

In crude (not yet tested code) it should looks something like this

def your_fn(values):
  pos = values["Positive"]
  neu = values["Neutral"]
  neg = values["Negative"]
  # 1. If the value of negative or positive is most dominating and >= 0.8 (80%) then mark it as very negative or very positive.
  if (pos >= .8): 
    return "Very positive"
  if (neg >= .8):
    return "Very negative"
  
  # 2. If the value of negative or positive is most dominating but it is >= 0.5 but less than 0.8 then just negative or positive.
  if (pos >= .5): 
    return "Positive"
  if (neg >= .5):
    return "Negative"

  # 3. If the value of neutral is >= 0.5 then Neutral. There is no such thing as Very Neutral.
  if (neu >= .5):
    return "Neutral"
  
  return "-"

df['Sentiment'] = df.apply(your_fn, axis=1)