I have a large data frame whose values are either TRUE
,FALSE
, or NA
. I want to keep only the columns that contains at least one TRUE
value. How do achieve this?
Here's a minimal example:
df <- data.frame(
c1 = c(FALSE,FALSE,FALSE,FALSE),
c2 = c(FALSE,TRUE,FALSE,NA),
c3 = c(FALSE,NA,TRUE,NA),
c4 = c(FALSE,FALSE,NA,NA)
)
> df
c1 c2 c3 c4
1 FALSE FALSE FALSE FALSE
2 FALSE TRUE NA FALSE
3 FALSE FALSE TRUE NA
4 FALSE NA NA NA
I want to remove columns c1
and c4
, and keep only c2
and c3
. I know that TRUE
values exist in my original larger data frame (using table(df==TRUE)
), but I don't know which function(s) to use to identify their columns.
CodePudding user response:
We can use select
with any
library(dplyr)
df %>%
select(where(~ is.logical(.x) && any(.x, na.rm = TRUE)))
-output
c2 c3
1 FALSE FALSE
2 TRUE NA
3 FALSE TRUE
4 NA NA
Or in base R
with colSums
on the columns and check if the sum is greater than 1 (TRUE
-> 1 and FALSE
-> 0)
df[colSums(df, na.rm = TRUE) > 0]
-output
c2 c3
1 FALSE FALSE
2 TRUE NA
3 FALSE TRUE
4 NA NA