I have two large vectors:
a <- sample(1:1000000, 300000)
b <- sample(1:1000000, 40000)
I want to find the elements in a
that match b
:
a[(which(a %in% b))]
but instead of returning a vector I would like to have each matching element stored as a separate entry in a list, e.g.:
sapply(b, function(x) a[(which(a %in% x))])
this would to the job but it takes very long and the speed difference between the two is huge.
Is there a way to store the results in a list that is actually fast?
CodePudding user response:
If you are happy with NA
instead of elements with lengths zero, you can coerce the vector as list and replace
.
replace(as.list(a), !a %in% b, NA)
CodePudding user response:
You can try to use Map
and set the length of the not matching to 0 or use [<-
to replace the non matched.
a <- c(1,4,5)
b <- c(4,7)
Map(\(a,b) {`length<-`(a, b)}, b, (b %in% a))
#[[1]]
#[1] 4
#
#[[2]]
#numeric(0)
`[<-`(as.list(b), !b %in% a, list(b[0]))
#replace(as.list(b), !b %in% a, list(b[0])) #Alternative
#[[1]]
#[1] 4
#
#[[2]]
#numeric(0)
sapply(b, function(x) a[(which(a %in% x))]) #Method form the question
#[[1]]
#[1] 4
#
#[[2]]
#numeric(0)