Home > Mobile >  How to check if a string contain consecutive characters in a vector using R
How to check if a string contain consecutive characters in a vector using R

Time:11-11

I have the following vector of characters and minimum threshold.

ref_vector <- c("R", "H", "K")
min_thres <- 5

What I want to do is given a string, and check if that string contains any consecutive characters within ref_vector and with the amount less than equal to min_thres.

So this is the example of string and the answer:

x1 <- "GMRRRRRRRS"    # Answ: True
#        *****
#         *****
#          *****


x2 <- "GKRKRRHRRS"    # Answ: True
#       *****
#        *****
#         *****
#          *****


x3 <- "GKRKARHQRS"    # Answ: False
#       *** ** *

The length and content of the input string and ref_vector can be varied. How can I achieve that with R?

CodePudding user response:

We can use grepl() here with an appropriate regex pattern:

x <- c("GMRRRRRRRS", "GKRKRRHRRS", "GKRKARHQRS")
ref_vector <- c("R", "H", "K")
regex = paste0("[", paste(ref_vector, collapse=""), "]{5}")  # [RHK]{5}
grepl(regex, x)

[1]  TRUE  TRUE FALSE
  • Related