Create multiple columns using dynamic naming (dplyr approach)-CodePudding

Suppose I have a vector of strings contains job vacancy requirements df. I also have a vector of strings contains programming language names prog_langs. I am looking for an ellegent dplyr way, how can I create within mutate method multiple columns for each programming language of vector prog_langs with certain column names .name = "ProgLang_{prog_langs}" to test whether string of df``` conitans particular progrmaming languge (TRUE if contains, FALSE otherwise).

# custom FUN
is_contains = function(txt, cond) if(grepl(cond, txt)) return(TRUE) else return(FALSE)

# Vector of programming languages
prog_langs = c("python", "java", "sql", "html")

# Vector of strings contains job vacancies requirements
df = data.frame("string" = c("exposure to scripting or programming languages (e.g python, c , or powershell).", "scripting skills (e.g. java, javascript, beanshell, etc.)",
                             "basic understanding of sql", "html and css knowledge is a must."))


# example of code
df %>% 
  mutate(across(.cols = vars(prog_langs), .fns = function(x) is_contains(txt = string, cond = x), .names = 'ProgLang_{.col}'))

Desired output:

New df with N new columns (where N is the length of prog_langs, i.e. number of programming languages), each of columns must contain TRUE or FALSE.

CodePudding user response：

Using purrr::map, purrr::transpose and tidyr::unnest_wider you could do:

library(dplyr, warn=FALSE)
library(purrr)
library(tidyr)

prog_langs <- c("python", "java", "sql", "html")
names(prog_langs) <- prog_langs

df %>%
  mutate(ProgLang = transpose(map(prog_langs, ~ grepl(.x, string)))) %>% 
  unnest_wider(ProgLang)
#> # A tibble: 4 × 5
#>   string                                                python java  sql   html 
#>   <chr>                                                 <lgl>  <lgl> <lgl> <lgl>
#> 1 exposure to scripting or programming languages (e.g … TRUE   FALSE FALSE FALSE
#> 2 scripting skills (e.g. java, javascript, beanshell, … FALSE  TRUE  FALSE FALSE
#> 3 basic understanding of sql                            FALSE  FALSE TRUE  FALSE
#> 4 html and css knowledge is a must.                     FALSE  FALSE FALSE TRUE

CodePudding user response：

This solution uses tidyr::crossing to obtain the cartesian product between string and prog_langs, then looks for matches using grepl and finally widens the data.frame using tidyr::pivot_wider

library(purrr)
library(tidyr)
library(dplyr)
df |>
    crossing(ProgLang = prog_langs) |>
    mutate(contains = map2_lgl(ProgLang, string,  ~grepl(.x, .y))) |>
    pivot_wider(names_from = ProgLang,
                values_from = contains,
                names_prefix = "ProgLang_")


##>   # A tibble: 4 × 5
##>   string                ProgLang_html ProgLang_java ProgLang_python ProgLang_sql
##>   <chr>                 <lgl>         <lgl>         <lgl>           <lgl>       
##> 1 basic understanding … FALSE         FALSE         FALSE           TRUE        
##> 2 exposure to scriptin… FALSE         FALSE         TRUE            FALSE       
##> 3 html and css knowled… TRUE          FALSE         FALSE           FALSE       
##> 4 scripting skills (e.… FALSE         TRUE          FALSE           FALSE

Edit as requested in the comment

# Vector of programming languages
prog_langs <- c("python", "java", "sql", "html")
certificate <- c("oscp", "cissp", "ceh")

data.frame(Type = "ProgLang", pattern = prog_langs) |>
    rbind(data.frame(Type = "Certificate", pattern = certificate)) |>
    crossing(df) |>
    mutate(Contains = map2_lgl(pattern, string, grepl)) |>
    mutate(Thing = paste(Type, pattern, sep = "_"),
           .keep = "unused") |>
    pivot_wider(names_from = Thing, values_from = Contains)
    
##>    # A tibble: 4 × 8
##>   string         Certificate_ceh Certificate_cis… Certificate_oscp ProgLang_html
##>   <chr>          <lgl>           <lgl>            <lgl>            <lgl>        
##> 1 basic underst… FALSE           FALSE            FALSE            FALSE        
##> 2 exposure to s… FALSE           FALSE            FALSE            FALSE        
##> 3 html and css … FALSE           FALSE            FALSE            TRUE         
##> 4 scripting ski… FALSE           FALSE            FALSE            FALSE        
##> # … with 3 more variables: ProgLang_java <lgl>, ProgLang_python <lgl>,
##> #   ProgLang_sql <lgl>