Home > OS >  Excluding the positive lookahead from the capture group
Excluding the positive lookahead from the capture group

Time:10-22

I have the following text

<root>
  <path>/my/data</path>
  <paths>/global/data</paths>
</root>

and I'm trying to get a regex capture group for /my/data/ and /global/data only. I tried this:

^\s*(?=<path>|<paths>)(.*)$

but I don't understand why the (.*) groups are:

<path>/my/data</path> <paths>/global/data</paths>

Is there any way to exclude the positive lookahead from the capture group?

CodePudding user response:

The .* consumes the <path> and <paths> that are checked for with your lookahead. Look, (?=<path>|<paths>)(.*) in your regex first checks if there is <path> or <paths> immediately to the right of the current location and if there is, (.*) readily consumes (=adds the matched text to the overall match value and advances the regex index to the end of the current subpattern match) the <path> or <paths> since .* matches zero or more chars other than line break chars, as many as possible.

Make the lookahead pattern consuming:

^\s*(?:<path>|<paths>)(.*)$

See the enter image description here

  • Related