Home > Net >  Get Text Starting From Last Occurrence of Certain Substring Leading to Match
Get Text Starting From Last Occurrence of Certain Substring Leading to Match

Time:06-14

Given a long string that generally follows this syntax:

/C=US/foo=bar/var=1/CN=JONES.FRED.R.0123456789:xxj31ZMTZzkVA 
/C=US/foo=pop/var=2/CN=BLAKE.DAPHNE.P.1234567890:xxj31ZMTZzkVA
/C=US/foo=bit/var=8/CN=BINKLEY.VELMA.W.2345678901:xxj31ZMTZzkVA
/C=US/foo=hat/var=17/CN=ROGERS.SHAGGY.N.3456789012:xxj31ZMTZzkVA
/C=US/foo=jam/var=39/CN=DOO.SCOOBY.D.4567890123:xxj31ZMTZzkVA

I want to capture what follows the previous occurrence of "/C=US/" that leads up to the last name dot first name that follows "CN=", and finally the text that precedes the colon (:). The last name, dot, and first name are not hard-coded but rather passed in from a variable.

For example, given "DOO.SCOOBY", I want to extract this text:

/C=US/foo=jam/var=39/CN=DOO.SCOOBY.D.4567890123

Here is the Regex I am using:

(?<=\/C=US\/)(.*?)(?=DOO.SCOOBY) (.*?) :

The problem is, it extracts ALL of the text preceding the match of "DOO.SCOOBY" to the colon, except for the very first "/C=US/". So, I nearly get the entire string back. It's also important to note there are no linebreaks or spaces in this string; it is all bunched together. How can I get text that only goes back as far as the previous "/C=US/"? I've searched plenty on regexes and specifically this scenario, but can't seem to find anything. It looks like I need to implement the positive lookbehind correctly.

CodePudding user response:

You can use

\/C=US\/(?:(?!\/C=US\/).)*?DOO\.SCOOBY[^:]*

See the regex demo.

Details:

  • \/C=US\/ - a /C=US/ string
  • (?:(?!\/C=US\/).)*? - any single char, other than line break chars, zero or more but as few as possible occurrences, that does not start a /C=US/ substring
  • DOO\.SCOOBY - a DOO.SCOOBY string
  • [^:]* - zero or more chars other than :.
  • Related