So I've got a vector of 5 letter words and I want to be able to create a function that extracts the words that contain ALL of the letters in the pattern.
For example, if my vector is ("aback", "abase", "abate", "agate", "allay") and I'm looking for words that contain BOTH "a" and "b", I want the function to return ("aback", "abase", "abate"). I don't care what position or how many times these letters occur in the words, only that the words contain both of them.
I've tried to do this by creating a function that is meant to combine grepl's with an &. But the problem here is the grepl function doesn't accept vectors as the pattern. My plan was for this function to achieve grepl("a", word_vec) & grepl("b", word_vec). I also need this to be scalable so if I want to search for all words containing "a" AND "b" AND "c", for example.
grepl_cat <- function(str, words_vec) {
pat <- str_split(str, "")
first_let = TRUE
for (i in 1:length(pat)) {
if (first_let){
result <- sapply(pat[i], grepl, x = word_vec)
first_let <- FALSE
}
print(pat[i])
result <- result & sapply(pat[i], grepl, x = word_vec)
}
return(result)
}
word_vec[grepl_cat("abc", word_vec)]
The function I've written above definitely isn't doing what it's intended to do.
I'm wondering if there an easier way to do this with regex patterns or there's a way to input each letter in the str into the grepl function as non-vectors.
CodePudding user response:
A possible solution in base R:
s <- c("aback", "abase", "abate", "agate", "allay")
subset(s, grepl("(a)(b)", s))
#> [1] "aback" "abase" "abate"
Another possible solution, based on tidyverse
:
library(tidyverse)
s <- c("aback", "abase", "abate", "agate", "allay")
s %>%
data.frame(s = .) %>%
filter(str_detect(s, "(a)(b)")) %>%
pull(s)
#> [1] "aback" "abase" "abate"
CodePudding user response:
For a,b and c
regex solution would be:
^.*a.*b.*c.*$
You may add more letters as needed
Alternative regex approach:
^(?=.*a)(?=.*b)(?=.*c).*$