I am looking to return a specific group in the previous row via regex.
Suppose I have the following information and the target is to extract the value 90 on the basis of the differentiation in the following line.
QTY 66:90:PCE
SCC 2
DTM 45:20200416:15
QTY 66:60:PCE
SCC 3
DTM 35:20210614:2
If I were to traget the value 90, I'd have to look for the SCC 2
tag and if I were to loom for the value 60, it would be the SCC 3
tag.
I got this far in an attempt to return the value 90 (?<=^QTY\ 66:)(\d )(.*\n.*SCC\ 2.*)
but it seems convoluted and I fail to extract only Group 1. Here is the link to regex101. I am using R for the actual application. Thanks for the help !
CodePudding user response:
You can use
(?<=:)\d (?=[^\d\r\n]*[\r\n] .*SCC\ 2)
See the regex demo. Details:
(?<=:)
- a:
must occur immediately to the left of the current location\d
- one or more digits(?=[^\d\r\n]*[\r\n] .*SCC\ 2)
- immediately to the right, there must be[^\d\r\n]*
- any zero or more chars other than digits, CR and LF[\r\n]
- one or more CR or LF chars.*SCC\ 2
- any text on a line up to the rigthmost occurrence ofSCC 2
.
In R, you can use
library(stringr)
str_extract(vec, "(?<=:)\\d (?=[^\\d\r\n]*[\r\n] .*SCC\\ 2)")
And a couple of base R approaches with sub
:
sub(".*?\\ \\d :(\\d )[^\r\n]*[\r\n] [^\r\n]*SCC\\ 2.*", "\\1", vec)
sub("(?s).*?\\ \\d :(\\d )(?-s).*\\R.*SCC\\ 2(?s).*", "\\1", vec, perl=TRUE)
See regex 1 demo and regex 2 demo.
See the R demo online:
vec <- "QTY 66:90:PCE\nSCC 2\nDTM 45:20200416:15\nQTY 66:60:PCE\nSCC 3\nDTM 35:20210614:2"
sub(".*?\\ \\d :(\\d )[^\r\n]*[\r\n] [^\r\n]*SCC\\ 2.*", "\\1", vec)
sub("(?s).*?\\ \\d :(\\d )(?-s).*\\R.*SCC\\ 2(?s).*", "\\1", vec, perl=TRUE)
library(stringr)
str_extract(vec, "(?<=:)\\d (?=[^\\d\r\n]*[\r\n] .*SCC\\ 2)")
All yield [1] "90"
.