I need to extract a string with multiple variations in between two comma delimiters.
The known similarity of these string is that it contains "LED" in the line.
Possible variations include "W-LED", "OLED", "Edge LED (Local Dimming)", "Direct LED" but are not only limited to those.
I want to extract all the substrings in between the delimiter with the comma removed. The strings are in a column inside a data frame. Two example:
ori_col <- c(
"Display: 27 in, VA, Viewing angles (H/V): 170 / 160, W-LED, 1920 x 1080 pixels",
"Display: 21.5 in, VA, Edge LED (Local Dimming), 1920 x 1080 pixels"
)
df <- as.data.frame(ori_col)
What I want to extract
"W-LED"
"Edge LED (Local Dimming)"
So I plan to mutate a new column to extract the values from the original column using regex.
df %>% mutate(new_column = str_extract(ori_col, "regex"))
I figure it must use something like lookaheads and lookbehinds but have no idea how to write the in between regex.
df %>% mutate(new_column = str_extract(ori_col, "(?<=\\,)(what should I write here)(?=\\,)"))
This question is derived from my previously overcomplicated question
To get an understanding of the components that are landed in each group remove the subset [,3]
from the str_match
results:
mutate(df, extracted_str = str_match(string = ori_col,
pattern = "(.*\\,)(.*LED.*)(\\,.*)"))
You can run trimws
/ str_trim
to get remove white spaces from the results
mutate(df, extracted_str = str_trim(str_match(string = ori_col,
pattern = "(.*\\,)(.*LED.*)(\\,.*)")[,3]))