I want to transform the following loop into apply/lapply syntax in order to make it more efficient:
for (i in seq(1, nrow(df)) {
is.element(df$a[i], unlist(strsplit(df$b[i], "/")))
}
I have tried this:
is.element(df$a, unlist(strsplit(df$b[i], "/")))
But it does not work because of the unlist statement.
Also tried:
mapply(is.element, df$a, unlist(strsplit(df$b, "/")))
Example of the data:
print(df$a)
[1] "A" "G" "T" "A" "CCG"
print(df$b)
[1] "G/A" "C/TTTTTA" "C/-" "A/G" "G/A/C"
CodePudding user response:
You could also use a regular expression:
mapply(\(x, y) grepl(sprintf("/?%s/?", x), y), df$a, df$b)
A G T A CCG
TRUE FALSE FALSE TRUE FALSE
Or with the purrr
package:
map2_lgl(df$a, df$b, ~ any(.x == str_split(.y, "/")[[1]]))
[1] TRUE FALSE FALSE TRUE FALSE
CodePudding user response:
Use of unlist
will recursively unlist the string into a single vector (which is okay when we are looping as there is only a single element) and which may have a different length
when compared to a
, whereas if we use the list
from strsplit
the length
will be same as a
and mapply
requires all arguments to be of same length (exception is of elements will length
1 which gets recycled)
mapply(is.element, df$a, strsplit(df$b, "/"))
A G T A CCG
TRUE FALSE FALSE TRUE FALSE
Also, an easier vectorized option is str_detect
library(stringr)
str_detect(df$b, df$a)
[1] TRUE FALSE FALSE TRUE FALSE
data
df <- structure(list(a = c("A", "G", "T", "A", "CCG"), b = c("G/A",
"C/TTTTTA", "C/-", "A/G", "G/A/C")), class = "data.frame",
row.names = c(NA,
-5L))