I have the following vector of characters and minimum threshold.
ref_vector <- c("R", "H", "K")
min_thres <- 5
What I want to do is given a string, and check if that string contains any consecutive characters within ref_vector
and with the amount less than equal to min_thres
.
So this is the example of string and the answer:
x1 <- "GMRRRRRRRS" # Answ: True
# *****
# *****
# *****
x2 <- "GKRKRRHRRS" # Answ: True
# *****
# *****
# *****
# *****
x3 <- "GKRKARHQRS" # Answ: False
# *** ** *
The length and content of the input string and ref_vector
can be varied.
How can I achieve that with R?
CodePudding user response:
We can use grepl()
here with an appropriate regex pattern:
x <- c("GMRRRRRRRS", "GKRKRRHRRS", "GKRKARHQRS")
ref_vector <- c("R", "H", "K")
regex = paste0("[", paste(ref_vector, collapse=""), "]{5}") # [RHK]{5}
grepl(regex, x)
[1] TRUE TRUE FALSE