I have started working on a sentiment analysis, but I have problem with transforming the lexicon into the required format
My data looks like something this:
word | alternativeform1 | alternativeform2 | value |
---|---|---|---|
abmachen | abgemacht | abmachst | 0.4 |
Aktualisierung | Aktualisierungen | NA | 0.2 |
I need it to look like this
word | value |
---|---|
abmachen | 0.4 |
abgemacht | 0.4 |
abmachst | 0.4 |
Aktualisierung | 0.2 |
Aktualisierungen | 0.2 |
Can you help me find the easy way to do this? Thank you very much :)
CodePudding user response:
You could use
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-value, values_to = "word") %>%
drop_na(word) %>%
select(word, value)
This returns
# A tibble: 5 x 2
word value
<chr> <dbl>
1 abmachen 0.4
2 abgemacht 0.4
3 abmachst 0.4
4 Aktualisierung 0.2
5 Aktualisierungen 0.2