Use mutate case_when() in a specific range of columns in dplyr-CodePudding

I have a large data frame that looks like the df1. I want to search in the range of columns between col2:col4 (col2,col3,col4)if any character contains the string S.

library(tidyverse)

df <- tibble(position=c(100,200,300),
             correction=c("62M89S", 
                          "8M1D55M88S",
                          "1S25M1S36M89S"))

df1 <- df %>% 
  separate(correction, into = str_c("col", 1:5), 
           sep = "(?<=\\D)(?=\\d)", fill = "left", remove = FALSE)

df1
#> # A tibble: 3 × 7
#>   position correction    col1  col2  col3  col4  col5 
#>      <dbl> <chr>         <chr> <chr> <chr> <chr> <chr>
#> 1      100 62M89S        <NA>  <NA>  <NA>  62M   89S  
#> 2      200 8M1D55M88S    <NA>  8M    1D    55M   88S  
#> 3      300 1S25M1S36M89S 1S    25M   1S    36M   89S

^{Created on 2022-03-12 by the reprex package (v2.0.1)}

I want my data to look like this

df1
#>   position correction    col1  col2  col3  col4  col5     inner_S
#>      <dbl> <chr>         <chr> <chr> <chr> <chr> <chr>      
#> 1      100 62M89S        <NA>  <NA>  <NA>  62M   89S         NO 
#> 2      200 8M1D55M88S    <NA>  8M    1D    55M   88S         NO
#> 3      300 1S25M1S36M89S 1S    25M   1S    36M   89S         YES

since there is the character 1S in col3.

I tried this, but I am sure I am violating some properties

df1 %>% 
  mutate_at(vars(col2:col4),
                             ~inner_S=case_when(grepl("S",.) ~ "Yes", 
                              TRUE ~ "No"
                            ))

CodePudding user response：

dplyr's c_across is very handy for operations like this:

df1 %>% 
  rowwise() %>% 
  mutate(inner_S = ifelse(any(grepl('S', c_across(col1:col4))), 'YES', 'NO'))

  position correction    col1  col2  col3  col4  col5  inner_S
     <dbl> <chr>         <chr> <chr> <chr> <chr> <chr> <chr>  
1      100 62M89S        NA    NA    NA    62M   89S   NO     
2      200 8M1D55M88S    NA    8M    1D    55M   88S   NO     
3      300 1S25M1S36M89S 1S    25M   1S    36M   89S   YES

CodePudding user response：

Please find another possible solution using data.table

Reprex

Code

library(data.table)

setDT(df1)[, inner_S := apply(.SD, 1, function(x) fifelse(any(grepl("S", x)), "YES", "NO")), .SDcols = paste0("col", 2:4)][]

Output

#>    position    correction col1 col2 col3 col4 col5 inner_S
#> 1:      100        62M89S <NA> <NA> <NA>  62M  89S      NO
#> 2:      200    8M1D55M88S <NA>   8M   1D  55M  88S      NO
#> 3:      300 1S25M1S36M89S   1S  25M   1S  36M  89S     YES

^{Created on 2022-03-12 by the reprex package (v2.0.1)}

CodePudding user response：

Answer by jdobres is perfect. Here is a more complicated one using across:

library(tidyverse)

df1 %>% 
  mutate(across(col1:col4, ~ifelse(
    str_detect(., 'S'), TRUE, FALSE), .names = 'new_{col}')) %>% 
  unite(inner_S, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(inner_S = ifelse(str_detect(inner_S, 'TRUE'), "YES", "NO"))

  position correction    col1  col2  col3  col4  col5  inner_S
     <dbl> <chr>         <chr> <chr> <chr> <chr> <chr> <chr>  
1      100 62M89S        NA    NA    NA    62M   89S   NO     
2      200 8M1D55M88S    NA    8M    1D    55M   88S   NO     
3      300 1S25M1S36M89S 1S    25M   1S    36M   89S   YES

CodePudding user response：

An alternative to using rowwise approach which can be quite costly when working with larger data sets is to sum the TRUE values.

df1 %>%
      mutate(inner_S = ifelse(rowSums(across(col1:col4, str_detect, "S"), na.rm = T) > 1, "YES", "NO"))