I have a string vector of symptoms, possibly multiple symptoms which are separated by commas, say:
x <- c('throat dry, muscles a bit painful', 'throat is a bit painful', 'throat pain, chest tightness', 'throatpain')
I'd like to use grepl
or other regex function to return TRUE if "throat pain" or any slight variation is matched. In the example vector above, the result should be FALSE TRUE TRUE TRUE
.
Thanks.
CodePudding user response:
This works for your example. Look for "throat", then "anything but comma 0 or more times", then "pain".
library(stringr)
str_detect(x, "throat[^,]{0,}pain")
[1] FALSE TRUE TRUE TRUE
CodePudding user response:
Does this work: Using negative lookbehind.
grepl('throat[A-z\\s]*(,?<|)pain.*', x, perl = 1)
[1] FALSE TRUE TRUE TRUE