Home > Net >  Regex matching "dem" but not "democrat" in R
Regex matching "dem" but not "democrat" in R

Time:11-02

I'm trying to match a certain substring ("dem") but do not want to match on a specific string that includes that substring ("democrat"). Can this be done using grep?

For example, I have the following:

> my_text <- c("demRace", "democrat", "demGender")
> grepl(pattern = "dem", x = my_text)
[1] TRUE TRUE TRUE

And my desired output is:

> grepl(pattern = some_pattern, x = my_text)
[1] TRUE FALSE TRUE

CodePudding user response:

One approach uses a negative lookahead:

my_text <- c("demRace", "democrat", "demGender")
grepl(pattern = "dem(?!ocrat$).*", my_text, perl=TRUE)

[1]  TRUE FALSE  TRUE

CodePudding user response:

A possible solution:

my_text <- c("demRace", "democrat", "demGender")
my_text <- my_text[!grepl(pattern = "democrat", x = my_text)]
grepl(pattern = "dem", x = my_text)

EDITED (to fix the issues pointed out below by Onyambu)

my_text <- c("demRace", "democrat", "demGender") -> tmp
tmp[grepl(pattern = "democrat", x = tmp)] <- "x"
grepl(pattern = "dem", x = tmp) 
  • Related