So I have the IMDB movies dataset which has a column 'genres' which has '|' separated movie genres. ex "Crime|Drama|Horror"
Now each and every row has a different genre combination but I want to separate it out and assign 1 if the movie has that genre and 0 if it doesn't. I have written this code to get the unique genres which I can make columns of.
li = list()
for(x in movie_clean$genres) {
tokens = tokenize_words(x)
for(y in tokens)
li = append(li, y)
}
li = li[!duplicated(li)]
How do I now assign 1 and 0 to each separate column from the main genres column? So I want the final output to be
| Adventure | Crime | Drama |
| 1 | 0 | 1 |
CodePudding user response:
Let's suppose you have a vector that looks like
v <- c("Crime|Drama|Horror", "Apple|Banana|Orange", "Country|Rock|Rap")
Then using tidyverse
, you can do:
data.frame(v) %>% separate(v, c("Col1", "Col2", "Col3"), sep = "[|]")
and get
Col1 Col2 Col3
1 Crime Drama Horror
2 Apple Banana Orange
3 Country Rock Rap
CodePudding user response:
A base R
option with read.table
read.table(text = v, header = FALSE, sep = "|")
V1 V2 V3
1 Crime Drama Horror
2 Apple Banana Orange
3 Country Rock Rap
data
v <- c("Crime|Drama|Horror", "Apple|Banana|Orange", "Country|Rock|Rap")