Home > Software engineering >  Capturing and extracting a RegEX group in R
Capturing and extracting a RegEX group in R

Time:10-07

I have a set of data that looks as such:

text_string <- structure(list(text_string = c("A Nanny-Back Up Care and Staffing Company-San Diego, OC, LA, San Francisco, Portland, Las Vegas, Phoenix, Seattle, Denver and NY. @jefffoes", 
"Creative Producer of @crwnmag  LA-NY-TX [email protected] Founded @marcusharper", 
"daily elements for life and style  texas transplant in california LA [email protected] read my blog   shop my instagram", 
"LIVE, LAUGH, LOVE")), class = "data.frame", row.names = c(NA, 
-4L))

I am trying to capture each instance of "LA" in the string and create a new field with it. In the Regex code I used it should return a match of "LA" for the first three strings, while the last one returns no match. You can see the example here.

I thought this code would do the trick, but it appears to not be the case:

text_string_new <- text_string %>% mutate(new_field = str_replace(string = text_string,
                                                           pattern = "(LA)(\\b|,)",
                                                           replacement = "\\1"))

All that seems to do is return an exact copy of the text_string field.

CodePudding user response:

Using str_extract rather than str_replace seems to do the trick.

text_string %>% mutate(new_field = str_extract(string = text_string,
                                               pattern = "(LA)(\\b|,)"))
  •  Tags:  
  • r
  • Related