Home > Enterprise >  How can I filter a vector based on regex list in R?
How can I filter a vector based on regex list in R?

Time:08-22

I have a character string vector that I would like to filter based on keywords from a second vector.

Below is a small reprex:

list1 <- c("I like apples", "I eat bread", "Bananas are my favorite")
fruit <- c("apple","banana")

I am presuming I will be needing to use stringr/stringi, but I would, in essence, like to do something alongs the lines of list1 %in% fruit and it return T,F,T.

Any suggestions?

CodePudding user response:

We can do this with grepl without using external packages.

grepl can handle multiple patterns separated by |, therefore we can first concatenate the strings in fruit together with | as the separator.

Remember to set ignore.case = TRUE if you don't care about case (note the "banana" in your example has different case).

grepl(paste(fruit, collapse = "|"), list1, ignore.case = T)
[1]  TRUE FALSE  TRUE

Or to subset list1:

list1[grepl(paste(fruit, collapse = "|"), list1, ignore.case = T)]
[1] "I like apples"           "Bananas are my favorite"

CodePudding user response:

A solution with str_dectect:

libraray(tidyverse)
data.frame(list1) %>%
  mutate(Flag = str_detect(list1, paste0("(?i)", paste0(fruit, collapse = "|"))))
                    list1  Flag
1           I like apples  TRUE
2             I eat bread FALSE
3 Bananas are my favorite  TRUE

If you want to filter(i.e. subset) your data:

data.frame(list1) %>%
  filter(str_detect(list1, paste0("(?i)", paste0(fruit, collapse = "|"))))
                    list1
1           I like apples
2 Bananas are my favorite

Note that (?i) is used to make the match case-insensitive.

  • Related