Apply for checking if element in one column is included in list of other column row wise-CodePudding

I want to transform the following loop into apply/lapply syntax in order to make it more efficient:

for (i in seq(1, nrow(df)) {
     is.element(df$a[i], unlist(strsplit(df$b[i], "/")))
}

I have tried this:

is.element(df$a, unlist(strsplit(df$b[i], "/")))

But it does not work because of the unlist statement.

Also tried:

mapply(is.element, df$a, unlist(strsplit(df$b, "/")))

Example of the data:

print(df$a)

[1] "A" "G" "T" "A" "CCG"

print(df$b)

[1] "G/A" "C/TTTTTA" "C/-" "A/G" "G/A/C"

CodePudding user response：

You could also use a regular expression:

mapply(\(x, y) grepl(sprintf("/?%s/?", x), y), df$a, df$b)
    A     G     T     A   CCG 
 TRUE FALSE FALSE  TRUE FALSE

Or with the purrr package:

map2_lgl(df$a, df$b, ~ any(.x == str_split(.y, "/")[[1]]))
[1]  TRUE FALSE FALSE  TRUE FALSE

CodePudding user response：

Use of unlist will recursively unlist the string into a single vector (which is okay when we are looping as there is only a single element) and which may have a different length when compared to a, whereas if we use the list from strsplit the length will be same as a and mapply requires all arguments to be of same length (exception is of elements will length 1 which gets recycled)

mapply(is.element, df$a, strsplit(df$b, "/"))
  A     G     T     A   CCG 
 TRUE FALSE FALSE  TRUE FALSE

Also, an easier vectorized option is str_detect

library(stringr)
str_detect(df$b, df$a)
[1]  TRUE FALSE FALSE  TRUE FALSE

data

df <- structure(list(a = c("A", "G", "T", "A", "CCG"), b = c("G/A", 
"C/TTTTTA", "C/-", "A/G", "G/A/C")), class = "data.frame", 
row.names = c(NA, 
-5L))