Home > other >  Multiple patterns matching with a column by dplyr
Multiple patterns matching with a column by dplyr

Time:12-31

I'd like to create a column that is derived from a column that is character typed. I have some set of patterns, which is possible to accept and the others shouldn't be accepted.

Here is what I tried:

library(dplyr)

set.seed(1)

index <- sample(1:nrow(iris),10)

iris2 <- iris[index,]

required_cols <- c('ersicol','inic')

iris2 %>% 
mutate(logical_column = case_when(any(sapply(required_cols,grepl,x = Species)) ~ 'WORKED',
                                  TRUE ~ 'NOT_WORKED'))

In this case, all logical_column is marked as 'WORKED' but only 'ersicol' or 'inic' pattern including observations should be marked as 'WORKED'.

The desired output should be like:

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species    logical_column
          <dbl>       <dbl>        <dbl>       <dbl> <fct>      <chr>         
 1          5.8         2.7          4.1         1   versicolor WORKED        
 2          6.4         2.8          5.6         2.1 virginica  WORKED        
 3          4.4         3.2          1.3         0.2 setosa     NOT_WORKED        
 4          4.3         3            1.1         0.1 setosa     NOT_WORKED        
 5          7           3.2          4.7         1.4 versicolor WORKED        
 6          5.4         3            4.5         1.5 versicolor WORKED        
 7          5.4         3.4          1.7         0.2 setosa     NOT_WORKED        
 8          7.6         3            6.6         2.1 virginica  WORKED        
 9          6.1         2.8          4.7         1.2 versicolor WORKED        
10          4.6         3.4          1.4         0.3 setosa     NOT_WORKED    

Thanks in advance.

CodePudding user response:

The any is the key here. It just takes from the full data, instead, use rowwise if we want to use the OP's code

library(dplyr)
iris2 %>%
    rowwise %>%
    mutate(logical_column = case_when(any(sapply(required_cols,
           grepl,x = Species)) 
         ~ 'WORKED',
    
                                  TRUE ~ 'NOT_WORKED')) %>%
    ungroup

-output

# A tibble: 10 × 6
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species    logical_column
          <dbl>       <dbl>        <dbl>       <dbl> <fct>      <chr>         
 1          5.8         2.7          4.1         1   versicolor WORKED        
 2          6.4         2.8          5.6         2.1 virginica  WORKED        
 3          4.4         3.2          1.3         0.2 setosa     NOT_WORKED    
 4          4.3         3            1.1         0.1 setosa     NOT_WORKED    
 5          7           3.2          4.7         1.4 versicolor WORKED        
 6          5.4         3            4.5         1.5 versicolor WORKED        
 7          5.4         3.4          1.7         0.2 setosa     NOT_WORKED    
 8          7.6         3            6.6         2.1 virginica  WORKED        
 9          6.1         2.8          4.7         1.2 versicolor WORKED        
10          4.6         3.4          1.4         0.3 setosa     NOT_WORKED  

It may be more efficient, if we make use of vectorized options - paste (str_c) the 'required_cols' to a single string (collapse = "|"), use str_detect to check if the substring present, convert it to numeric index ( 1) and make use of the index for replacing a vector

library(stringr)
iris2 %>% 
   mutate(logical_column = c("NOT_WORKED", "WORKED")[
     1   str_detect(Species, str_c(required_cols, collapse = "|"))])

-output

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species logical_column
68           5.8         2.7          4.1         1.0 versicolor         WORKED
129          6.4         2.8          5.6         2.1  virginica         WORKED
43           4.4         3.2          1.3         0.2     setosa     NOT_WORKED
14           4.3         3.0          1.1         0.1     setosa     NOT_WORKED
51           7.0         3.2          4.7         1.4 versicolor         WORKED
85           5.4         3.0          4.5         1.5 versicolor         WORKED
21           5.4         3.4          1.7         0.2     setosa     NOT_WORKED
106          7.6         3.0          6.6         2.1  virginica         WORKED
74           6.1         2.8          4.7         1.2 versicolor         WORKED
7            4.6         3.4          1.4         0.3     setosa     NOT_WORKED
  • Related