I'm trying to write a function to extract words that come before or after a group of phrases.
Extracting words that come after a single phrase, for example, item
in a string variable called x
, I had luck with the below code:
str_extract(x, pattern="(?<=item).*?(?=,)")
How do I pass on a list of phrases to look for onto a regex? For example, I want to create a list of phrases, called keywords
and extract a group of words that come after these phrases. How do I tell regex keywords
is a list, not a text?
keywords <- c("item",
"date",
"size",
"length")
CodePudding user response:
Your pattern
must look like
paste0("(?<=", paste(keywords, collapse="|"),").*?(?=,)")
paste0("(?<=", paste(keywords, collapse="|"),")[^,]*")
The first pattern will look like (?<=item|date|size|length).*?(?=,)
. This matches a location that is immediately preceded with item
, date
, size
or length
, then consumes any zero or more chars other than line break chars, as few as possible, up to the leftmost occurrence of a comma without consuming it (as (?=,)
is a positive lookahead).
The second regex will look like (?<=item|date|size|length)[^,]*
, and will match similarly as above pattern. Note the difference though: [^,]*
matches any zero or more chars other than a comma, so 1) it will match even if there is no comma later, and 2) it will match any chars including line break chars.