I have a vector A
, which contains a list of genera, which I want to use to subset a second vector, B
. I have successfully used grepl to extract anything from B
that has a partial match to the genera in A
. Below is a reproducible example of what I have done.
But now I would like to get a list of which genera in A
matched with something in B
, and which which genera did not. I.e. the "matched" list would contain Cortinarius and Russula, and the "unmatched" list would contain Laccaria and Inocybe. Any ideas on how to do this? In reality my vectors are very long, and the genus names in B
are not all in the same position amongst the other info.
# create some dummy vectors
A <- c("Cortinarius","Laccaria","Inocybe","Russula")
B <- c("fafsdf_Cortinarius_sdfsdf","sdfsdf_Russula_sdfsdf_fdf","Tomentella_sdfsdf","sdfas_Sebacina","sdfsf_Clavulina_sdfdsf")
# extract the elements of B that have a partial match to anything in A.
new.B <- B[grepl(paste(A,collapse="|"), B)]
# But now how do I tell which elements of A were present in B, and which ones were not?
CodePudding user response:
We could use lapply
or sapply
to loop over the patterns and then get a named output
out <- setNames(lapply(A, function(x) grep(x, B, value = TRUE)), A)
THen, it is easier to check the ones returning empty elements
> out[lengths(out) > 0]
$Cortinarius
[1] "fafsdf_Cortinarius_sdfsdf"
$Russula
[1] "sdfsdf_Russula_sdfsdf_fdf"
> out[lengths(out) == 0]
$Laccaria
character(0)
$Inocybe
character(0)
and get the names
of that
> names(out[lengths(out) > 0])
[1] "Cortinarius" "Russula"
> names(out[lengths(out) == 0])
[1] "Laccaria" "Inocybe"
CodePudding user response:
You can use sapply
with grepl
to check for each value of A
matching with ever value of B
.
sapply(A, grepl, B)
# Cortinarius Laccaria Inocybe Russula
#[1,] TRUE FALSE FALSE FALSE
#[2,] FALSE FALSE FALSE TRUE
#[3,] FALSE FALSE FALSE FALSE
#[4,] FALSE FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE FALSE
You can take column-wise sum of these values to get the count of matches.
result <- colSums(sapply(A, grepl, B))
result
#Cortinarius Laccaria Inocybe Russula
# 1 0 0 1
#values with at least one match
names(Filter(function(x) x > 0, result))
#[1] "Cortinarius" "Russula"
#values with no match
names(Filter(function(x) x == 0, result))
#[1] "Laccaria" "Inocybe"