I have the following text
<root>
<path>/my/data</path>
<paths>/global/data</paths>
</root>
and I'm trying to get a regex capture group for /my/data/
and /global/data
only. I tried this:
^\s*(?=<path>|<paths>)(.*)$
but I don't understand why the (.*)
groups are:
<path>/my/data</path>
<paths>/global/data</paths>
Is there any way to exclude the positive lookahead from the capture group?
CodePudding user response:
The .*
consumes the <path>
and <paths>
that are checked for with your lookahead. Look, (?=<path>|<paths>)(.*)
in your regex first checks if there is <path>
or <paths>
immediately to the right of the current location and if there is, (.*)
readily consumes (=adds the matched text to the overall match value and advances the regex index to the end of the current subpattern match) the <path>
or <paths>
since .*
matches zero or more chars other than line break chars, as many as possible.
Make the lookahead pattern consuming:
^\s*(?:<path>|<paths>)(.*)$