Home > Back-end >  How to select columns depending on multiple conditions in dplyr
How to select columns depending on multiple conditions in dplyr

Time:06-04

I'm looking for a solution in dplyr for the task of selecting columns of a dataframe based on multiple conditions. Say, we have this type of df:

X <- c("B", "C", "D", "E")
a1 <- c(1, 0, 3, 0)
a2 <- c(235, 270, 100, 1)
a3 <- c(3, 1000, 900, 2)
df1 <- data.frame(X, a1, a2, a3)

Let's further assume I want to select that column/those columns that are

  • (i) numeric
  • (ii) where all values are smaller than 5

That is, in this case, what we want to select is column a1. How can this be done in dplyr? My understanding is that in order to select a column in dplyr you use select and, if that selection is governed by conditions, also where. But how to combine two such select(where...) statements? This, for example, is not the right way to do it as it throws an error:

df1 %>%
  select(where(is.numeric) & where(~ all(.) < 5))
Error: `where()` must be used with functions that return `TRUE` or `FALSE`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In all(.) : coercing argument of type 'character' to logical

CodePudding user response:

Inside where, we need to supply functions that have logical results.

library(dplyr)

select(df1, \(x) all(x < 5))

# or this, which might be more semantically correct
select(df1, where(\(x) is.numeric(x) & all(x < 5)))

  a1
1  1
2  0
3  3
4  0

Data

df1 <- structure(list(X = c("B", "C", "D", "E"), a1 = c(1, 0, 3, 0), 
    a2 = c(235, 270, 100, 1), a3 = c(3, 1000, 900, 2)), class = "data.frame", row.names = c(NA, 
-4L))

CodePudding user response:

Another possible solution, based on dplyr::mutate:

library(dplyr)

df1 %>% 
  mutate(across(everything(), ~ if (all(.x < 5) & is.numeric(.x)) .x))

#>   a1
#> 1  1
#> 2  0
#> 3  3
#> 4  0

Or even more shortly:

df1 %>% 
  mutate(across(everything(), ~ if (all(.x < 5)) .x))
  • Related