An example vector:
string <- "Junk1_Junk2_Junk3__ID1_Junk4_Junk5.pdf"
I am trying to subset ID1
by counting _
(underscores) from the right; so subset between the second and 3rd underscore from the right.
expected output: ID1
My attempt was to try to use the double __
,
but this is not going to work, because not all my string list has it.
Attempt: (_){2}([^_] )
Side note, I am trying to get comfortable with regex; please recommend a resource to build and test.
Any assistance is appreciated.
CodePudding user response:
You can use
library(stringr)
stringr::str_extract(string, "[^_] (?=(?:_[^_]*){2}$)")
Or, same approach with base R:
## Base R:
sub(".*?([^_] )(?:_[^_]*){2}$", "\\1", string)
See the regex demo and the R demo online.
Details:
[^_]
- one or more chars other than_
(?=(?:_[^_]*){2}$)
- a positive lookahead that requires two sequences of_
and then zero or more repetitons of any char other than_
till the end of string.*?([^_] )(?:_[^_]*){2}$
matches.*?
- any zero or more chars, as few as possible([^_] )
- Capturing group 1 (\1
in the replacement pattern refers to this captured string): one or more chars other than_
(?:_[^_]*){2}
- two sequences of_
and then zero or more repetitons of any char other than_
$
- end of string.
CodePudding user response:
sub(".*_([^_] )(_[^_] ){2}", "\\1", string)
[1] "ID1"