I am looking for a more versatile solution presented in: return TRUE if *any* of columns chosen using tidyselect contains() is TRUE
We start with the data.frame:
dat <- data.frame(var1 = c(TRUE, FALSE, FALSE),
var2 = c(FALSE, TRUE, FALSE),
var3 = c(FALSE, FALSE, TRUE))
And now I want to test any if any particular combination of columns contains a true, as provided by the user. Currently I am imagining the user would provide a list of columns:
is_it_true <- function(df, columns) {}
Such that is_it_true(dat,columns = c("var1", "var2")
should return a new column in dat
that returns TRUE for each row if the column var1 or var2 contains a TRUE:
var1 var2 var3 anyTRUE
1 TRUE FALSE FALSE TRUE
2 FALSE TRUE FALSE TRUE
3 FALSE FALSE TRUE FALSE
The ~funky~ solution I currently have is:
is_it_true <- function(df, columns) {
dat$anyTRUE <- dat %>%
select(all_of(test_col)) %>%
mutate(anyTRUE = if_any(.cols = contains('var'))) %>%
select(anyTRUE)
}
Such that is_any_true(dat, c("var1","var3"))
would return:
var1 var2 var3 anyTRUE
1 TRUE FALSE FALSE TRUE
2 FALSE TRUE FALSE FALSE
3 FALSE FALSE TRUE TRUE
and is_any_true(dat, c("var1", "var2", "var3"))
would return:
var1 var2 var3 anyTRUE
1 TRUE FALSE FALSE TRUE
2 FALSE TRUE FALSE TRUE
3 FALSE FALSE TRUE TRUE
Finally, I am hoping the solution could be made robust to NA entries, such that if one of the column-row combinations being tested == NA
but another column being tested == T
the solution returns T
and not NA
CodePudding user response:
Using rowSums
and across
you could do:
dat <- data.frame(var1 = c(TRUE, FALSE, FALSE),
var2 = c(FALSE, TRUE, FALSE),
var3 = c(FALSE, FALSE, TRUE))
library(dplyr)
is_it_true <- function(df, columns, na.rm = FALSE) {
df |>
mutate(anyTRUE = rowSums(across(all_of(columns)), na.rm = na.rm) >= 1)
}
is_it_true(dat, c("var1", "var2"))
#> var1 var2 var3 anyTRUE
#> 1 TRUE FALSE FALSE TRUE
#> 2 FALSE TRUE FALSE TRUE
#> 3 FALSE FALSE TRUE FALSE
is_it_true(dat, c("var1", "var2", "var3"))
#> var1 var2 var3 anyTRUE
#> 1 TRUE FALSE FALSE TRUE
#> 2 FALSE TRUE FALSE TRUE
#> 3 FALSE FALSE TRUE TRUE
Using the na.rm
argument you could take account of NA
s like so
dat1 <- data.frame(var1 = c(TRUE, NA, FALSE),
var2 = c(FALSE, TRUE, FALSE),
var3 = c(FALSE, FALSE, TRUE))
is_it_true(dat1, c("var1", "var2"))
#> var1 var2 var3 anyTRUE
#> 1 TRUE FALSE FALSE TRUE
#> 2 NA TRUE FALSE NA
#> 3 FALSE FALSE TRUE FALSE
is_it_true(dat1, c("var1", "var2"), na.rm = TRUE)
#> var1 var2 var3 anyTRUE
#> 1 TRUE FALSE FALSE TRUE
#> 2 NA TRUE FALSE TRUE
#> 3 FALSE FALSE TRUE FALSE
CodePudding user response:
If you want a maximally efficient solution, use the kit package:
kit::pany(.subset(df, columns), na.rm = TRUE)