Home > Mobile >  Vbscript regex with variable length in the end
Vbscript regex with variable length in the end

Time:09-06

I'm trying to return several matches (records?) from a string but seems like greedy regexp always takes first and includes all. Can I achieve as described below using regex?

INPUT:

<id:7/>Any text over here<id:8>Another text here (variable length and possibly including new line chars)<id:10>Yet another variable length string.

DESIRED OUTPUT

Regex matches, in this case 3 of them, that I'd like to see as separate matches from VBScript.Regexp.Execute method:

<id:7/>Any text over here
<id:8/>Another text here (variable length and possibly including new line chars)
<id:10/>Yet another variable 
length string including a new line.

So far I've tried the following regex but it will always return the full string as one match, and a bunch more I don't think make sense to include as they all have returned the same result:

<id:\d />(.|\n).*

Of course I can get the instances of each <id:999/> pattern using only

<id:\d />

Which returns something like

<id:7/>
<id:8/>
<id:10/>

But then I don't get the variable length text related to each id tag.

NOTE: Id tags are nothing of HTML nor XML, it's just the way records are delimited in this particular case.

CodePudding user response:

Try with this regex:

<id:\d \/>(?:.|\n)*?(?=<|$)

Only the Global flag set to True is required.

Explanation:
<id:\d \/> Match the beginning of your records.
(?:.|\n)*? Non-capturing group of any character or newline, repeated zero-or-more times, but non-greedy
(?=<|$) Lookahead matching: match on ending character, either < or end-of-string, but don't include this match in final result.

You can try it here:
https://regex101.com/r/xUW90r/1

CodePudding user response:

You might use

<id:\d \/?>[\s\S]*?(?=<id:\d \/?>|$)

Explanation

  • <id:\d \/?> Match <id: then 1 digits with optional / and >
  • [\s\S]*? Match any char including newlines, as few as possitlbe
  • (?= Positive lookahead
    • <id:\d \/?> Match <id: then 1 digits with optional / and >
    • | Or
    • $ End of string
  • ) Close lookahea

See a regex demo without multiline.

  • Related