I have this table:
col1 <- c("1","2", "3", "4", "5")
col1 <- sample(col1, 1000, replace=TRUE, prob=c(0.2, 0.2, 0.2, 0.2, 0.2))
col2 <- c("6","7", "8")
col2 <- sample(col2, 1000, replace=TRUE, prob=c(0.2, 0.4, 0.4))
col3 <- c("9","10", "11", "12")
col3 <- sample(col3, 1000, replace=TRUE, prob=c(0.1, 0.1, 0.4, 0.4))
col4 <- rexp( 1000, 0.5)
col5 <- rexp( 1000, 0.5)
id <- 1:1000
table_1 = data.frame(id, col1, col2, col3, col4, col5)
And this list:
f <- function(set) {
n <- length(set)
masks <- 2^(1:n-1)
lapply( 1:2^n-1, function(u) set[ bitwAnd(u, masks) != 0 ] )
}
sample_list = f(min(col1):max(col3))
I want to select rows from "table_1" based on entries in "sample_list". For example:
select = as.integer(runif(1, min = 1, max = 512))
>select
381
my_select = sample_list[select]
sample_list[381]
[[1]]
[1] 3 4 5 6 7 9
Is there someway that I can "quickly" select all rows in "table_1" where (table_1$col1, table_1$col2, table_1$col3) have values that are contained in "my_select"?
This would be the equivalent of:
subset(table_1, col1 %in% c("3", "4", "5") & col2 %in% c("6", "7") & col3 %in% c("9"))
Thank you!
CodePudding user response:
Not sure if you mean all of the columns should have the value from the list index or just one column.
Here is one solution that returns the rows where all match
my_select <- function(index){
where<- which(apply(table_1[2:4], 1, \(x) all(x %in% sample_list[[index]])) |> t())
where
}
[1] 131 146 174 179 205 272 396 450 500 512 574 589 619 669 673 703 736 751 887 893 925 992