Home > Software engineering >  Checking if any value of a subset of columns for each row of a data.frame is TRUE
Checking if any value of a subset of columns for each row of a data.frame is TRUE

Time:07-26

I am looking for a more versatile solution presented in: return TRUE if *any* of columns chosen using tidyselect contains() is TRUE

We start with the data.frame:

dat <- data.frame(var1 = c(TRUE, FALSE, FALSE), 
                  var2 = c(FALSE, TRUE, FALSE), 
                  var3 = c(FALSE, FALSE, TRUE))

And now I want to test any if any particular combination of columns contains a true, as provided by the user. Currently I am imagining the user would provide a list of columns:

is_it_true <- function(df, columns) {}

Such that is_it_true(dat,columns = c("var1", "var2") should return a new column in dat that returns TRUE for each row if the column var1 or var2 contains a TRUE:

   var1  var2  var3 anyTRUE
1  TRUE FALSE FALSE    TRUE
2 FALSE  TRUE FALSE    TRUE
3 FALSE FALSE  TRUE    FALSE

The ~funky~ solution I currently have is:

is_it_true <- function(df, columns) {
  dat$anyTRUE <- dat %>% 
      select(all_of(test_col)) %>% 
      mutate(anyTRUE = if_any(.cols = contains('var'))) %>%
      select(anyTRUE)
}

Such that is_any_true(dat, c("var1","var3")) would return:

   var1  var2  var3 anyTRUE
1  TRUE FALSE FALSE    TRUE
2 FALSE  TRUE FALSE    FALSE
3 FALSE FALSE  TRUE    TRUE

and is_any_true(dat, c("var1", "var2", "var3")) would return:

   var1  var2  var3 anyTRUE
1  TRUE FALSE FALSE    TRUE
2 FALSE  TRUE FALSE    TRUE
3 FALSE FALSE  TRUE    TRUE

Finally, I am hoping the solution could be made robust to NA entries, such that if one of the column-row combinations being tested == NA but another column being tested == T the solution returns T and not NA

CodePudding user response:

Using rowSums and across you could do:

dat <- data.frame(var1 = c(TRUE, FALSE, FALSE), 
                  var2 = c(FALSE, TRUE, FALSE), 
                  var3 = c(FALSE, FALSE, TRUE))

library(dplyr)

is_it_true <- function(df, columns, na.rm = FALSE) {
  df |> 
    mutate(anyTRUE = rowSums(across(all_of(columns)), na.rm = na.rm) >= 1)
}

is_it_true(dat, c("var1", "var2"))
#>    var1  var2  var3 anyTRUE
#> 1  TRUE FALSE FALSE    TRUE
#> 2 FALSE  TRUE FALSE    TRUE
#> 3 FALSE FALSE  TRUE   FALSE
is_it_true(dat, c("var1", "var2", "var3"))
#>    var1  var2  var3 anyTRUE
#> 1  TRUE FALSE FALSE    TRUE
#> 2 FALSE  TRUE FALSE    TRUE
#> 3 FALSE FALSE  TRUE    TRUE

Using the na.rm argument you could take account of NAs like so

dat1 <- data.frame(var1 = c(TRUE, NA, FALSE), 
                  var2 = c(FALSE, TRUE, FALSE), 
                  var3 = c(FALSE, FALSE, TRUE))

is_it_true(dat1, c("var1", "var2"))
#>    var1  var2  var3 anyTRUE
#> 1  TRUE FALSE FALSE    TRUE
#> 2    NA  TRUE FALSE      NA
#> 3 FALSE FALSE  TRUE   FALSE
is_it_true(dat1, c("var1", "var2"), na.rm = TRUE)
#>    var1  var2  var3 anyTRUE
#> 1  TRUE FALSE FALSE    TRUE
#> 2    NA  TRUE FALSE    TRUE
#> 3 FALSE FALSE  TRUE   FALSE

CodePudding user response:

If you want a maximally efficient solution, use the kit package:

kit::pany(.subset(df, columns), na.rm = TRUE)
  • Related