my problem is an extension of this one:
Create a list of all values of a variable grouped by another variable in R
Let's say we have a data frame with restaurants and the meals they offer by type and course:
food <- data.frame(course = c("starter", "starter", "starter", "main", "main", "main", "main", "main"),
food_type = c("salad", "salad", "salad", "fish", "fish", "pasta", "pasta", "pasta"),
restaurant = c("dining_palace", "delicious_kitchen", "food_cube", "dining_palace", "food_cube", "dining_palace", "delicious_kitchen", "food_cube"),
meal1 = c("cesar_salad", "green_salad", "green_salad", "codfish", "trout", "spaghetti", "farfalle", "macaroni"),
meal2 = c("coleslaw", "tomato_salad", NA, "salmon", "codfish", "tagliatelle", "penne", "farfalle"),
meal3 = c(NA, "coleslaw", NA, "tuna", NA, NA, "spaghetti", "ravioli"), stringsAsFactors = FALSE)
food
course food_type restaurant meal1 meal2 meal3
1 starter salad dining_palace cesar_salad coleslaw <NA>
2 starter salad delicious_kitchen green_salad tomato_salad coleslaw
3 starter salad food_cube green_salad <NA> <NA>
4 main fish dining_palace codfish salmon tuna
5 main fish food_cube trout codfish <NA>
6 main pasta dining_palace spaghetti tagliatelle <NA>
7 main pasta delicious_kitchen farfalle penne spaghetti
8 main pasta food_cube macaroni farfalle ravioli
My aim is to gnereate a varaible that contains a list of all meals by course and food type independent of the offering restaurant. Using the code from the link above with c(meal1, meal2, meal3) gives exactly the desired outcome:
library(dplyr)
selection_per_type <- food %>%
group_by(course, food_type) %>%
summarise(meals=paste(sort(unique(c(meal1, meal2, meal3))),collapse=",")) %>%
ungroup()
selection_per_type
course food_type meals
<chr> <chr> <chr>
1 main fish codfish,salmon,trout,tuna
2 main pasta farfalle,macaroni,penne,ravioli,spaghetti,tagliatelle
3 starter salad cesar_salad,coleslaw,green_salad,tomato_salad
However, I'm looking for a solution with a higher number of meal variables, where a manual definition via c() is not practical. As the first n letters of all target variables are identical, I've tried some versions of "pattern" "grepl" and "regexec", but nothing has worked so far. Are there any ideas, how to get this done?
CodePudding user response:
If there are more columns, we may use pivot_longer
to convert to long format and then do a group by summarise
library(dplyr)
library(tidyr)
library(stringr)
food %>%
pivot_longer(cols = starts_with("meal"), values_to ='meal') %>%
group_by(course, food_type) %>%
summarise(means = str_c(unique(sort(na.omit(meal))),
collapse = ","), .groups = 'drop')
-output
# A tibble: 3 × 3
course food_type means
<chr> <chr> <chr>
1 main fish codfish,salmon,trout,tuna
2 main pasta farfalle,macaroni,penne,ravioli,spaghetti,tagliatelle
3 starter salad cesar_salad,coleslaw,green_salad,tomato_salad