Home > OS >  Keep only columns that meet a criterion
Keep only columns that meet a criterion

Time:04-03

I have a large data frame whose values are either TRUE,FALSE, or NA. I want to keep only the columns that contains at least one TRUE value. How do achieve this?

Here's a minimal example:

df <- data.frame(
   c1 = c(FALSE,FALSE,FALSE,FALSE),
   c2 = c(FALSE,TRUE,FALSE,NA),
   c3 = c(FALSE,NA,TRUE,NA),
   c4 = c(FALSE,FALSE,NA,NA)
 )
> df
     c1    c2    c3    c4
1 FALSE FALSE FALSE FALSE
2 FALSE  TRUE    NA FALSE
3 FALSE FALSE  TRUE    NA
4 FALSE    NA    NA    NA

I want to remove columns c1 and c4, and keep only c2 and c3. I know that TRUE values exist in my original larger data frame (using table(df==TRUE)), but I don't know which function(s) to use to identify their columns.

CodePudding user response:

We can use select with any

library(dplyr)
df %>%
   select(where(~ is.logical(.x) && any(.x, na.rm = TRUE)))

-output

  c2    c3
1 FALSE FALSE
2  TRUE    NA
3 FALSE  TRUE
4    NA    NA

Or in base R with colSums on the columns and check if the sum is greater than 1 (TRUE -> 1 and FALSE -> 0)

df[colSums(df, na.rm = TRUE) > 0]

-output

   c2    c3
1 FALSE FALSE
2  TRUE    NA
3 FALSE  TRUE
4    NA    NA
  • Related