I'm looking for a solution in dplyr
for the task of select
ing columns of a dataframe based on multiple conditions. Say, we have this type of df
:
X <- c("B", "C", "D", "E")
a1 <- c(1, 0, 3, 0)
a2 <- c(235, 270, 100, 1)
a3 <- c(3, 1000, 900, 2)
df1 <- data.frame(X, a1, a2, a3)
Let's further assume I want to select that column/those columns that are
- (i) numeric
- (ii) where all values are smaller than 5
That is, in this case, what we want to select is column a1
. How can this be done in dplyr
? My understanding is that in order to select a column in dplyr
you use select
and, if that selection is governed by conditions, also where
. But how to combine two such select(where...)
statements? This, for example, is not the right way to do it as it throws an error:
df1 %>%
select(where(is.numeric) & where(~ all(.) < 5))
Error: `where()` must be used with functions that return `TRUE` or `FALSE`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In all(.) : coercing argument of type 'character' to logical
CodePudding user response:
Inside where
, we need to supply functions that have logical results.
library(dplyr)
select(df1, \(x) all(x < 5))
# or this, which might be more semantically correct
select(df1, where(\(x) is.numeric(x) & all(x < 5)))
a1
1 1
2 0
3 3
4 0
Data
df1 <- structure(list(X = c("B", "C", "D", "E"), a1 = c(1, 0, 3, 0),
a2 = c(235, 270, 100, 1), a3 = c(3, 1000, 900, 2)), class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
Another possible solution, based on dplyr::mutate
:
library(dplyr)
df1 %>%
mutate(across(everything(), ~ if (all(.x < 5) & is.numeric(.x)) .x))
#> a1
#> 1 1
#> 2 0
#> 3 3
#> 4 0
Or even more shortly:
df1 %>%
mutate(across(everything(), ~ if (all(.x < 5)) .x))