I have an issue similar to this thread:
Search across multiple columns with a regular expression and extract the match
Let's take an iris dataset as an example. I would like to filter data based on values in several columns: let's say >=4 in cols which names end with ".Length". IRL data are much more complex than this reprex, which is why I want to use regular expression in cols rather than pick them one by one by their indices.
Tried multiple ways, including the following:
filtered <- iris %>% dplyr::filter(across(matches('.Length')>=4))
to no avail. Please help.
CodePudding user response:
Using dplyr::if_all
you could do:
library(dplyr)
iris1 <- iris %>%
filter(if_all(matches(".Length"), ~ .x >= 4))
str(iris1)
#> 'data.frame': 89 obs. of 5 variables:
#> $ Sepal.Length: num 7 6.4 6.9 5.5 6.5 5.7 6.3 6.6 5.9 6 ...
#> $ Sepal.Width : num 3.2 3.2 3.1 2.3 2.8 2.8 3.3 2.9 3 2.2 ...
#> $ Petal.Length: num 4.7 4.5 4.9 4 4.6 4.5 4.7 4.6 4.2 4 ...
#> $ Petal.Width : num 1.4 1.5 1.5 1.3 1.5 1.3 1.6 1.3 1.5 1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...
CodePudding user response:
Allthough dplyr::if_all
and if_any
were introduced for these specific situation to use it in conjunction with filter https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/
Here we could use an anonymous function:
across(.cols = everything(), .fns = NULL, ..., .names = NULL)
where the .fns
argument could be in
purrr-style-lambda e.g. ~ . >= 4
:
library(dplyr)
iris %>%
filter(across(ends_with('.Length'), ~ . >= 4))
> str(iris1)
'data.frame': 89 obs. of 5 variables:
$ Sepal.Length: num 7 6.4 6.9 5.5 6.5 5.7 6.3 6.6 5.9 6 ...
$ Sepal.Width : num 3.2 3.2 3.1 2.3 2.8 2.8 3.3 2.9 3 2.2 ...
$ Petal.Length: num 4.7 4.5 4.9 4 4.6 4.5 4.7 4.6 4.2 4 ...
$ Petal.Width : num 1.4 1.5 1.5 1.3 1.5 1.3 1.6 1.3 1.5 1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...