I have a dataset called cookbooks4 that currently looks like this
SectionTitle Author Work PubYear Item
Albertine Susan Markovitz The salad book 1928 pear,cream cheese,lettuce, vinegar
The dataset is long I just included the first line, but as you can imagine, each recipe has its own ingredients. I would need to create dummy variables for the Item (ingredients). I would like to select all unique ingredients and put them in a column. I should obtain something that looks like this. Bear in mind that this is the first line and I should have roughly 630 different in gredients (so 630 different columns for dummy variables). After cream I might have ingredients that are not listed in the item column of that specific recipe so the dummy would be 0. Any help would be greatly appreciated
SectionTitle Author Work PubYear Item pear cream ....
Albertine Susan... The salad.. 1928 pear,cream... 1 1
I did try this but I get an error message. Plus I would need to keep also all the other columns
library(dplyr)
library(stringr)
final <- strsplit(cookbooks4$Item, split = ",")
Item <- unique(str_trim(unlist(t)))
final2 <- as.data.frame(Reduce(cbind, lapply(Item, function(i) sapply(t, function(j) (any(grepl(i, j), na.rm = TRUE))))))
names(final2) <- item
CodePudding user response:
library(dplyr)
library(tidyr)
data <- data.frame(SectionTitle = "Albertine", Author = "Susan Markovitz",
Work = "The salad book", PubYear = 1928L, Item = "pear,cream,cheese,lettuce,vinegar")
data %>% mutate(Itemlist = strsplit(Item,",")) %>% unnest(Itemlist) %>%
pivot_wider(names_from = Itemlist, values_from = Itemlist, values_fn = length)
#> # A tibble: 1 × 10
#> SectionTitle Author Work PubYear Item pear cream cheese lettuce vinegar
#> <chr> <chr> <chr> <int> <chr> <int> <int> <int> <int> <int>
#> 1 Albertine Susan Mar… The … 1928 pear… 1 1 1 1 1