I'd like to create a column that is derived from a column that is character typed. I have some set of patterns, which is possible to accept and the others shouldn't be accepted.
Here is what I tried:
library(dplyr)
set.seed(1)
index <- sample(1:nrow(iris),10)
iris2 <- iris[index,]
required_cols <- c('ersicol','inic')
iris2 %>%
mutate(logical_column = case_when(any(sapply(required_cols,grepl,x = Species)) ~ 'WORKED',
TRUE ~ 'NOT_WORKED'))
In this case, all logical_column
is marked as 'WORKED' but only 'ersicol' or 'inic' pattern including observations should be marked as 'WORKED'.
The desired output should be like:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species logical_column
<dbl> <dbl> <dbl> <dbl> <fct> <chr>
1 5.8 2.7 4.1 1 versicolor WORKED
2 6.4 2.8 5.6 2.1 virginica WORKED
3 4.4 3.2 1.3 0.2 setosa NOT_WORKED
4 4.3 3 1.1 0.1 setosa NOT_WORKED
5 7 3.2 4.7 1.4 versicolor WORKED
6 5.4 3 4.5 1.5 versicolor WORKED
7 5.4 3.4 1.7 0.2 setosa NOT_WORKED
8 7.6 3 6.6 2.1 virginica WORKED
9 6.1 2.8 4.7 1.2 versicolor WORKED
10 4.6 3.4 1.4 0.3 setosa NOT_WORKED
Thanks in advance.
CodePudding user response:
The any
is the key here. It just takes from the full data, instead, use rowwise
if we want to use the OP's code
library(dplyr)
iris2 %>%
rowwise %>%
mutate(logical_column = case_when(any(sapply(required_cols,
grepl,x = Species))
~ 'WORKED',
TRUE ~ 'NOT_WORKED')) %>%
ungroup
-output
# A tibble: 10 × 6
Sepal.Length Sepal.Width Petal.Length Petal.Width Species logical_column
<dbl> <dbl> <dbl> <dbl> <fct> <chr>
1 5.8 2.7 4.1 1 versicolor WORKED
2 6.4 2.8 5.6 2.1 virginica WORKED
3 4.4 3.2 1.3 0.2 setosa NOT_WORKED
4 4.3 3 1.1 0.1 setosa NOT_WORKED
5 7 3.2 4.7 1.4 versicolor WORKED
6 5.4 3 4.5 1.5 versicolor WORKED
7 5.4 3.4 1.7 0.2 setosa NOT_WORKED
8 7.6 3 6.6 2.1 virginica WORKED
9 6.1 2.8 4.7 1.2 versicolor WORKED
10 4.6 3.4 1.4 0.3 setosa NOT_WORKED
It may be more efficient, if we make use of vectorized options - paste
(str_c
) the 'required_cols' to a single string (collapse = "|"
), use str_detect
to check if the substring present, convert it to numeric index ( 1
) and make use of the index for replacing a vector
library(stringr)
iris2 %>%
mutate(logical_column = c("NOT_WORKED", "WORKED")[
1 str_detect(Species, str_c(required_cols, collapse = "|"))])
-output
Sepal.Length Sepal.Width Petal.Length Petal.Width Species logical_column
68 5.8 2.7 4.1 1.0 versicolor WORKED
129 6.4 2.8 5.6 2.1 virginica WORKED
43 4.4 3.2 1.3 0.2 setosa NOT_WORKED
14 4.3 3.0 1.1 0.1 setosa NOT_WORKED
51 7.0 3.2 4.7 1.4 versicolor WORKED
85 5.4 3.0 4.5 1.5 versicolor WORKED
21 5.4 3.4 1.7 0.2 setosa NOT_WORKED
106 7.6 3.0 6.6 2.1 virginica WORKED
74 6.1 2.8 4.7 1.2 versicolor WORKED
7 4.6 3.4 1.4 0.3 setosa NOT_WORKED