Home > Mobile >  Apply for checking if element in one column is included in list of other column row wise
Apply for checking if element in one column is included in list of other column row wise

Time:05-21

I want to transform the following loop into apply/lapply syntax in order to make it more efficient:

for (i in seq(1, nrow(df)) {
     is.element(df$a[i], unlist(strsplit(df$b[i], "/")))
}

I have tried this:

is.element(df$a, unlist(strsplit(df$b[i], "/")))

But it does not work because of the unlist statement.

Also tried:

mapply(is.element, df$a, unlist(strsplit(df$b, "/")))

Example of the data:

print(df$a)

[1] "A" "G" "T" "A" "CCG"

print(df$b)

[1] "G/A" "C/TTTTTA" "C/-" "A/G" "G/A/C"

CodePudding user response:

You could also use a regular expression:

mapply(\(x, y) grepl(sprintf("/?%s/?", x), y), df$a, df$b)
    A     G     T     A   CCG 
 TRUE FALSE FALSE  TRUE FALSE

Or with the purrr package:

map2_lgl(df$a, df$b, ~ any(.x == str_split(.y, "/")[[1]]))
[1]  TRUE FALSE FALSE  TRUE FALSE

CodePudding user response:

Use of unlist will recursively unlist the string into a single vector (which is okay when we are looping as there is only a single element) and which may have a different length when compared to a, whereas if we use the list from strsplit the length will be same as a and mapply requires all arguments to be of same length (exception is of elements will length 1 which gets recycled)

mapply(is.element, df$a, strsplit(df$b, "/"))
  A     G     T     A   CCG 
 TRUE FALSE FALSE  TRUE FALSE 

Also, an easier vectorized option is str_detect

library(stringr)
str_detect(df$b, df$a)
[1]  TRUE FALSE FALSE  TRUE FALSE

data

df <- structure(list(a = c("A", "G", "T", "A", "CCG"), b = c("G/A", 
"C/TTTTTA", "C/-", "A/G", "G/A/C")), class = "data.frame", 
row.names = c(NA, 
-5L))
  • Related