type_1 type_2 type_3
0 0 1
0 1 0
0 0 1
0 1 1
1 0 1
1 0 1
I want to keep columns that have over 50% 1 values, which is only column type_3.
How can I do this in dplyr?
CodePudding user response:
You can do:
library(dplyr)
dat |>
select(all_of(
names(dat)[sapply(dat, \(x) sum(x)/length(x)>0.5)]
)
)
This takes advantage of the fact that you are in particular looking for 1s and the only values are 0 and 1. More generally, you can do:
VALUE_TO_MATCH = 1
dat |>
select(all_of(
names(dat)[sapply(dat, \(x) sum(x==VALUE_TO_MATCH)/length(x)>0.5)]
)
)
Data
dat <- read.table(text = "type_1 type_2 type_3
0 0 1
0 1 0
0 0 1
0 1 1
1 0 1
1 0 1", h = T)
CodePudding user response:
Another dplyr
option using select
with where
:
df <- read.table(text = "type_1 type_2 type_3
0 0 1
0 1 0
0 0 1
0 1 1
1 0 1
1 0 1", header = TRUE)
library(dplyr)
df %>%
select(where(~mean(.) > 0.5))
#> type_3
#> 1 1
#> 2 0
#> 3 1
#> 4 1
#> 5 1
#> 6 1
Created on 2022-07-25 by the reprex package (v2.0.1)
Base R
option using colMeans
:
df <- read.table(text = "type_1 type_2 type_3
0 0 1
0 1 0
0 0 1
0 1 1
1 0 1
1 0 1", header = TRUE)
df[which(colMeans(df) > 0.5)]
#> type_3
#> 1 1
#> 2 0
#> 3 1
#> 4 1
#> 5 1
#> 6 1
Created on 2022-07-25 by the reprex package (v2.0.1)