Home > Back-end >  Regular Expression to find a string containing specific substring between two delimeters in R
Regular Expression to find a string containing specific substring between two delimeters in R

Time:03-29

I need to extract a string with multiple variations in between two comma delimiters.

The known similarity of these string is that it contains "LED" in the line.

Possible variations include "W-LED", "OLED", "Edge LED (Local Dimming)", "Direct LED" but are not only limited to those.

I want to extract all the substrings in between the delimiter with the comma removed. The strings are in a column inside a data frame. Two example:

ori_col <- c(
  "Display: 27 in, VA, Viewing angles (H/V): 170  / 160, W-LED, 1920 x 1080 pixels",
  "Display: 21.5 in, VA, Edge LED (Local Dimming), 1920 x 1080 pixels"
)
df <- as.data.frame(ori_col)

What I want to extract

"W-LED"
"Edge LED (Local Dimming)"

So I plan to mutate a new column to extract the values from the original column using regex.

df %>% mutate(new_column = str_extract(ori_col, "regex"))

I figure it must use something like lookaheads and lookbehinds but have no idea how to write the in between regex.

df %>% mutate(new_column = str_extract(ori_col, "(?<=\\,)(what should I write here)(?=\\,)"))

This question is derived from my previously overcomplicated question Regex explanation from regex101

  • To get an understanding of the components that are landed in each group remove the subset [,3] from the str_match results:

    mutate(df, extracted_str = str_match(string = ori_col, 
                                         pattern = "(.*\\,)(.*LED.*)(\\,.*)"))
    
  • You can run trimws / str_trim to get remove white spaces from the results

    mutate(df, extracted_str = str_trim(str_match(string = ori_col, 
                                   pattern = "(.*\\,)(.*LED.*)(\\,.*)")[,3]))
    
    • Related