I have a character string vector that I would like to filter based on keywords from a second vector.
Below is a small reprex:
list1 <- c("I like apples", "I eat bread", "Bananas are my favorite")
fruit <- c("apple","banana")
I am presuming I will be needing to use stringr
/stringi
, but I would, in essence, like to do something alongs the lines of list1 %in% fruit
and it return T,F,T
.
Any suggestions?
CodePudding user response:
We can do this with grepl
without using external packages.
grepl
can handle multiple patterns separated by |
, therefore we can first concatenate the strings in fruit
together with |
as the separator.
Remember to set ignore.case = TRUE
if you don't care about case (note the "banana" in your example has different case).
grepl(paste(fruit, collapse = "|"), list1, ignore.case = T)
[1] TRUE FALSE TRUE
Or to subset list1
:
list1[grepl(paste(fruit, collapse = "|"), list1, ignore.case = T)]
[1] "I like apples" "Bananas are my favorite"
CodePudding user response:
A solution with str_dectect
:
libraray(tidyverse)
data.frame(list1) %>%
mutate(Flag = str_detect(list1, paste0("(?i)", paste0(fruit, collapse = "|"))))
list1 Flag
1 I like apples TRUE
2 I eat bread FALSE
3 Bananas are my favorite TRUE
If you want to filter
(i.e. subset) your data:
data.frame(list1) %>%
filter(str_detect(list1, paste0("(?i)", paste0(fruit, collapse = "|"))))
list1
1 I like apples
2 Bananas are my favorite
Note that (?i)
is used to make the match case-insensitive.