R code to separate movie genres coloumn from a dataset-CodePudding

So I have the IMDB movies dataset which has a column 'genres' which has '|' separated movie genres. ex "Crime|Drama|Horror"

Now each and every row has a different genre combination but I want to separate it out and assign 1 if the movie has that genre and 0 if it doesn't. I have written this code to get the unique genres which I can make columns of.

li = list()
for(x in movie_clean$genres) {
  tokens = tokenize_words(x)
  for(y in tokens)
    li = append(li, y)
}
li = li[!duplicated(li)]

How do I now assign 1 and 0 to each separate column from the main genres column? So I want the final output to be

| Adventure | Crime | Drama |

| 1         | 0     | 1     |

CodePudding user response：

Let's suppose you have a vector that looks like

v <- c("Crime|Drama|Horror", "Apple|Banana|Orange", "Country|Rock|Rap")

Then using tidyverse, you can do:

data.frame(v) %>% separate(v, c("Col1", "Col2", "Col3"), sep = "[|]")

and get

     Col1   Col2   Col3
1   Crime  Drama Horror
2   Apple Banana Orange
3 Country   Rock    Rap

CodePudding user response：

A base R option with read.table

 read.table(text = v, header = FALSE, sep = "|")
 V1     V2     V3
1   Crime  Drama Horror
2   Apple Banana Orange
3 Country   Rock    Rap

data

v <- c("Crime|Drama|Horror", "Apple|Banana|Orange", "Country|Rock|Rap")