Ok this example should clarify what I am looking for
set.seed(123456789)
df <- data.frame(
x1 = sample(c(0,1), size = 10, replace = TRUE),
x2 = sample(c(0,1), size = 10, replace = TRUE),
z1 = sample(c(0,1), size = 10, replace = TRUE)
)
I want to select all rows that have x1 and x2 =1. That is,
df[df$x1==1 & df$x2==1,]
which returns
x1 x2 z1
1 1 1 1
4 1 1 1
6 1 1 1
10 1 1 0
but I want to do it in a way that scales to many x variables (e.g. x1,x2,...x40), so I would like to index the columns by "x" rather than having to write df$x1==1 & df$x2==1 &... & df$x40==1. Note that I care about having the z1 variable in the resulting data set (i.e. while the rows are selected based on the x variables, I am not looking to select the x columns only). Is it possible?
CodePudding user response:
A possible solution, based on dplyr
:
library(dplyr)
set.seed(123456789)
df <- data.frame(
x1 = sample(c(0,1), size = 10, replace = TRUE),
x2 = sample(c(0,1), size = 10, replace = TRUE),
z1 = sample(c(0,1), size = 10, replace = TRUE)
)
df %>%
filter(across(starts_with("x"), ~ .x == 1))
#> x1 x2 z1
#> 1 1 1 1
#> 2 1 1 1
#> 3 1 1 1
#> 4 1 1 0
CodePudding user response:
Here is a base R way with Reduce
applied to the data.frame's rows.
cols <- grep("^x", names(df))
i <- apply(df[cols], 1, \(x) Reduce(`&`, x == 1L))
df[i,]
# x1 x2 z1
#1 1 1 1
#4 1 1 1
#6 1 1 1
#10 1 1 0