Home > front end >  Regular expression to search for whole phrases and single words at the same time in str_detect()
Regular expression to search for whole phrases and single words at the same time in str_detect()

Time:10-19

I'm confused about regular expressions in str_detect(). Here is the setup:

library(stringr)

vector_to_check <- c("overall survival (os) in melanoma participants (parts b plus d)", "median overall survival (os)", "one- and two-year overall survival rate will be determined.", "overall survival rate (os rate) at month 6 in all participants")

str_detect(vector_to_check, "rate") 
# [1] FALSE FALSE  TRUE  TRUE

str_detect(vector_to_check , "overall survival (os) in melanoma participants (parts b plus d)")
# [1] FALSE FALSE FALSE FALSE

Basically, I want to input two types of pattern in str_detect(string, pattern):

  1. Single words like "rate", "median", etc
  2. Whole phrases, like "overall survival (os) in melanoma participants (parts b plus d)"

Is there a regular expression (pattern) that allows for this?

Thank you

CodePudding user response:

Wrap with fixed as there are metacharacters (()) in it, which may need to be either escaped (\\) otherwise

library(stringr)
str_detect(vector_to_check , fixed("overall survival (os) in melanoma participants (parts b plus d)"))
[1]  TRUE FALSE FALSE FALSE
str_detect(vector_to_check, fixed("rate")) 
[1] FALSE FALSE  TRUE  TRUE

If we need to combine both,

library(purrr)
map(c("rate", "overall survival (os) in melanoma participants (parts b plus d)"), 
  ~ str_detect(vector_to_check, fixed(.x))) %>%
    reduce(`|`)

-output

[1]  TRUE FALSE  TRUE  TRUE

CodePudding user response:

You can use the collapse argument of paste0() to add an 'or' to any amount of patterns, like this:

library(stringr)

vector_to_check <- c("overall survival (os) in melanoma participants (parts b plus d)", "median overall survival (os)", "one- and two-year overall survival rate will be determined.", "overall survival rate (os rate) at month 6 in all participants")

patterns <- c("rate", 
              "overall survival \\(os\\) in melanoma participants \\(parts b plus d\\)")

str_detect(vector_to_check, paste0(patterns, collapse = "|"))
[1]  TRUE FALSE  TRUE  TRUE

I also added the \\ in front of the parenthesis, as in regular expression you'll need to escape them!

  • Related