I did a sentiment analysis using VADER and now want to classify the values with negative, positive and neutral.
Positive when compound score is > 0.05
Negative when its < - 0.05 neutral when in between -0.05 and 0.05
df_polarity$VADER_Sent = ifelse(df_polarity$VADER_Sent > 0.05, "pos",
ifelse (df_polarity$VADER_Sent < -0.05, "neg",
ifelse (between(df_polarity$VADER_Sent, -0.05, 0.05) , "neu", "NA")
)
)
When running this code, even values with - 0.4XXX will be classified as neutral and not as negative.
For some reason this won't work. There is anything I am missing... but I can figure out what it is...
I couldn't find any helpful tipps by googling it.
I hope someone of you can help me with this one!
Output from str(df_polarity):
$ VADER_Sent : chr "0.0" "-0.4939" "0.7717" "0.7096"
After further looking into my data, it seems that the "-" sign is not recognized in the context of a negative number.
Thanks to everyone who tried to help me! Really appreciated it!!!
CodePudding user response:
The problem is because the VADER_Sent
column is character. The comparisons <
and >
are checking alphabetically instead of numerically.
Example:
> -0.4939 < -0.05
[1] TRUE
> "-0.4939" < "-0.05"
[1] FALSE
Try using as.numeric(df_polarity$VADER_Sent)
in your ifelse()
statements to get around this.
CodePudding user response:
Can't be sure without reproducible code, but you should be able to just give the 'neutral' category as the second option in the second ifelse()
call.
df_polarity$VADER_Sent = ifelse(df_polarity$VADER_Sent > 0.05, "pos",
ifelse(df_polarity$VADER_Sent < -0.05, "neg", "neutral"
)
)
CodePudding user response:
This should work:
df %>%
mutate(X = if_else(VADER_sent < -0.5, "neg",
if_else(VADER_sent <= 0.5 & VADER_sent >= -0.5, "neutral", "pos"))
)
VADER_sent X
1 0.51 pos
2 0.10 neutral
3 2.00 pos
4 -0.60 neg
5 0.30 neutral
6 -1.20 neg
Data:
df <- data.frame(
VADER_sent = c(0.51, 0.1, 2,-0.6, 0.3, -1.2)
)