Home > OS >  How to create dummy variables in R based on multiple values within each cell in a column
How to create dummy variables in R based on multiple values within each cell in a column

Time:12-07

I have a dataset called cookbooks4 that currently looks like this

SectionTitle     Author            Work            PubYear            Item
Albertine        Susan Markovitz   The salad book  1928       pear,cream cheese,lettuce, vinegar 

The dataset is long I just included the first line, but as you can imagine, each recipe has its own ingredients. I would need to create dummy variables for the Item (ingredients). I would like to select all unique ingredients and put them in a column. I should obtain something that looks like this. Bear in mind that this is the first line and I should have roughly 630 different in gredients (so 630 different columns for dummy variables). After cream I might have ingredients that are not listed in the item column of that specific recipe so the dummy would be 0. Any help would be greatly appreciated

SectionTitle     Author            Work   PubYear       Item       pear   cream   ....
Albertine        Susan...   The salad..    1928    pear,cream...   1       1       

I did try this but I get an error message. Plus I would need to keep also all the other columns

library(dplyr)
library(stringr)
final <- strsplit(cookbooks4$Item, split = ",")
Item <- unique(str_trim(unlist(t)))
final2 <- as.data.frame(Reduce(cbind, lapply(Item, function(i) sapply(t, function(j)  (any(grepl(i, j), na.rm = TRUE))))))
names(final2) <- item

CodePudding user response:

library(dplyr)
library(tidyr)

data <- data.frame(SectionTitle = "Albertine", Author = "Susan Markovitz", 
    Work = "The salad book", PubYear = 1928L, Item = "pear,cream,cheese,lettuce,vinegar")

data %>% mutate(Itemlist = strsplit(Item,",")) %>% unnest(Itemlist) %>% 
  pivot_wider(names_from = Itemlist, values_from = Itemlist, values_fn = length)
#> # A tibble: 1 × 10
#>   SectionTitle Author     Work  PubYear Item   pear cream cheese lettuce vinegar
#>   <chr>        <chr>      <chr>   <int> <chr> <int> <int>  <int>   <int>   <int>
#> 1 Albertine    Susan Mar… The …    1928 pear…     1     1      1       1       1
  • Related