Extract text immediately before double colons-CodePudding

I have a string like this:

text <- "This is some text::stuff. Look, there's some::more. And here::is some more."

I would like to extract the words before the double colons. To do this, I use gregexpr to match for alpha-numerics immediately before double colons:

m <- gregexpr("[[:alnum:]]*::", text)

Then, I call regmatches to pull out this text, unlist the result to a vector, and finally strip out the double colons with gsub.

gsub("::", "", unlist(regmatches(text, m)))
#[1] "text" "some" "here"

This is the desired result, but relies on four function calls. Is there a more efficient way of achieving the same result?

CodePudding user response：

You can use

m <- gregexpr("[[:alnum:]] (?=::)", text, perl=TRUE)

See the regex demo. Here, [[:alnum:]] (?=::) matches one or more letters or digits and then checks if they are immediately followed with two colons without consuming the colons, since the (?=...) is a non-consuming lookahead construct.

Mind the perl=TRUE argument becomes obligatory here since the default TRE regex engine does not allow lookaround use. perl=TRUE enables the PCRE regex engine, and it allows both lookbehinds and lookaheads.

See an R demo:

text <- "This is some text::stuff. Look, there's some::more. And here::is some more."
m <- gregexpr("[[:alnum:]] (?=::)", text, perl=TRUE)
unlist(regmatches(text, m))
## => [1] "text" "some" "here"

CodePudding user response：

You can use lookahead and str_extract_all to do it all in one go:

library(stringr)
str_extract_all(text, "\\w (?=::)")[[1]]
[1] "text" "some" "here"