Home > Software design >  How to match multiple string arguments efficiently in R
How to match multiple string arguments efficiently in R

Time:08-16

tab1 <- data.frame(id = c(1, 42, 2, 88, 432, 9584), name = c("apple", "banana",
                                                             "apple", "mango",
                                                             "mango", "apple"))
> tab1
    id   name
1    1  apple
2   42 banana
3    2  apple
4   88  mango
5  432  mango
6 9584  apple

I have a data.frame named tab1 that contains the dictionary for finding the ids associated with different patterns.

For example, suppose I want to find the ids that are associated with the pattern "apple"

> tab1[which(tab1$name %in% "apple"), ]$id
[1]    1    2 9584

or with the pattern "mango",

> tab1[which(tab1$name %in% "mango"), ]$id
[1]  88 432

I would like to store these ids in a new data.frame where the ids are separated by | like this:

  pattern       id
1   apple 1|2|9584
2   mango   88|432
3   peach       NA

Suppose I have a very long list of patterns (say, over 1 million patterns) that I want to match with those in tab1, what's a quick way of doing this in R without relying on for loops?

CodePudding user response:

try this:

 tab1 <- data.frame(id = c(1, 42, 2, 88, 432, 9584), name = c("apple", "banana",
                                                             "apple", "mango",
                                                             "mango", "apple"))

pattern<-c("apple","mango","peach","orange")



rbind(aggregate(.~name,data=tab1[tab1$name %in% pattern,] ,FUN = function(x){paste0(x,collapse = "|")}),
      data.frame(name=pattern[!pattern %in% tab1$name],id=NA)
)
    name       id
1  apple 1|2|9584
2  mango   88|432
3  peach     <NA>
4 orange     <NA>
  • Related