Home > OS >  Create a variable listing grouped unique values of several other variables with name pattern
Create a variable listing grouped unique values of several other variables with name pattern

Time:07-22

my problem is an extension of this one:

Create a list of all values of a variable grouped by another variable in R

Let's say we have a data frame with restaurants and the meals they offer by type and course:

    food <- data.frame(course = c("starter", "starter", "starter", "main", "main", "main", "main", "main"),
                      food_type = c("salad", "salad", "salad", "fish", "fish", "pasta", "pasta", "pasta"),
                      restaurant = c("dining_palace", "delicious_kitchen", "food_cube", "dining_palace", "food_cube", "dining_palace", "delicious_kitchen", "food_cube"),
                      meal1 = c("cesar_salad", "green_salad", "green_salad", "codfish", "trout", "spaghetti", "farfalle", "macaroni"),
                      meal2 = c("coleslaw", "tomato_salad", NA, "salmon", "codfish", "tagliatelle", "penne", "farfalle"),
                      meal3 = c(NA, "coleslaw", NA, "tuna", NA, NA, "spaghetti", "ravioli"), stringsAsFactors = FALSE)

food

 course food_type        restaurant       meal1        meal2     meal3
1 starter     salad     dining_palace cesar_salad     coleslaw      <NA>
2 starter     salad delicious_kitchen green_salad tomato_salad  coleslaw
3 starter     salad         food_cube green_salad         <NA>      <NA>
4    main      fish     dining_palace     codfish       salmon      tuna
5    main      fish         food_cube       trout      codfish      <NA>
6    main     pasta     dining_palace   spaghetti  tagliatelle      <NA>
7    main     pasta delicious_kitchen    farfalle        penne spaghetti
8    main     pasta         food_cube    macaroni     farfalle   ravioli

My aim is to gnereate a varaible that contains a list of all meals by course and food type independent of the offering restaurant. Using the code from the link above with c(meal1, meal2, meal3) gives exactly the desired outcome:

library(dplyr)
selection_per_type <- food %>%
                      group_by(course, food_type) %>%
                      summarise(meals=paste(sort(unique(c(meal1, meal2, meal3))),collapse=",")) %>%
                      ungroup()

    selection_per_type
    
   course  food_type meals                                                
  <chr>   <chr>     <chr>                                                
1 main    fish      codfish,salmon,trout,tuna                            
2 main    pasta     farfalle,macaroni,penne,ravioli,spaghetti,tagliatelle
3 starter salad     cesar_salad,coleslaw,green_salad,tomato_salad 

However, I'm looking for a solution with a higher number of meal variables, where a manual definition via c() is not practical. As the first n letters of all target variables are identical, I've tried some versions of "pattern" "grepl" and "regexec", but nothing has worked so far. Are there any ideas, how to get this done?

CodePudding user response:

If there are more columns, we may use pivot_longer to convert to long format and then do a group by summarise

library(dplyr)
library(tidyr)
library(stringr)
food %>% 
  pivot_longer(cols = starts_with("meal"), values_to ='meal') %>% 
  group_by(course, food_type) %>%
  summarise(means = str_c(unique(sort(na.omit(meal))), 
       collapse = ","), .groups = 'drop')

-output

# A tibble: 3 × 3
  course  food_type means                                                
  <chr>   <chr>     <chr>                                                
1 main    fish      codfish,salmon,trout,tuna                            
2 main    pasta     farfalle,macaroni,penne,ravioli,spaghetti,tagliatelle
3 starter salad     cesar_salad,coleslaw,green_salad,tomato_salad    
  •  Tags:  
  • r
  • Related