I'm trying to return several matches (records?) from a string but seems like greedy regexp always takes first and includes all. Can I achieve as described below using regex?
INPUT:
<id:7/>Any text over here<id:8>Another text here (variable length and possibly including new line chars)<id:10>Yet another variable length string.
DESIRED OUTPUT
Regex matches, in this case 3 of them, that I'd like to see as separate matches from VBScript.Regexp.Execute method:
<id:7/>Any text over here
<id:8/>Another text here (variable length and possibly including new line chars)
<id:10/>Yet another variable
length string including a new line.
So far I've tried the following regex but it will always return the full string as one match, and a bunch more I don't think make sense to include as they all have returned the same result:
<id:\d />(.|\n).*
Of course I can get the instances of each <id:999/> pattern using only
<id:\d />
Which returns something like
<id:7/>
<id:8/>
<id:10/>
But then I don't get the variable length text related to each id tag.
NOTE: Id tags are nothing of HTML nor XML, it's just the way records are delimited in this particular case.
CodePudding user response:
Try with this regex:
<id:\d \/>(?:.|\n)*?(?=<|$)
Only the Global
flag set to True
is required.
Explanation:
<id:\d \/>
Match the beginning of your records.
(?:.|\n)*?
Non-capturing group of any character or newline, repeated zero-or-more times, but non-greedy
(?=<|$)
Lookahead matching: match on ending character, either <
or end-of-string, but don't include this match in final result.
You can try it here:
https://regex101.com/r/xUW90r/1
CodePudding user response:
You might use
<id:\d \/?>[\s\S]*?(?=<id:\d \/?>|$)
Explanation
<id:\d \/?>
Match<id:
then 1 digits with optional/
and>
[\s\S]*?
Match any char including newlines, as few as possitlbe(?=
Positive lookahead<id:\d \/?>
Match<id:
then 1 digits with optional/
and>
|
Or$
End of string
)
Close lookahea
See a regex demo without multiline.