I have a string like this:
text <- "This is some text::stuff. Look, there's some::more. And here::is some more."
I would like to extract the words before the double colons. To do this, I use gregexpr
to match for alpha-numerics immediately before double colons:
m <- gregexpr("[[:alnum:]]*::", text)
Then, I call regmatches
to pull out this text, unlist
the result to a vector, and finally strip out the double colons with gsub
.
gsub("::", "", unlist(regmatches(text, m)))
#[1] "text" "some" "here"
This is the desired result, but relies on four function calls. Is there a more efficient way of achieving the same result?
CodePudding user response:
You can use
m <- gregexpr("[[:alnum:]] (?=::)", text, perl=TRUE)
See the regex demo. Here, [[:alnum:]] (?=::)
matches one or more letters or digits and then checks if they are immediately followed with two colons without consuming the colons, since the (?=...)
is a non-consuming lookahead construct.
Mind the perl=TRUE
argument becomes obligatory here since the default TRE regex engine does not allow lookaround use. perl=TRUE
enables the PCRE regex engine, and it allows both lookbehinds and lookaheads.
See an R demo:
text <- "This is some text::stuff. Look, there's some::more. And here::is some more."
m <- gregexpr("[[:alnum:]] (?=::)", text, perl=TRUE)
unlist(regmatches(text, m))
## => [1] "text" "some" "here"
CodePudding user response:
You can use lookahead and str_extract_all
to do it all in one go:
library(stringr)
str_extract_all(text, "\\w (?=::)")[[1]]
[1] "text" "some" "here"