Home > database >  Regular expression with a lookahead to capture text between two starting points with no explicit end
Regular expression with a lookahead to capture text between two starting points with no explicit end

Time:08-09

I have a regular expression that works at https://regex101.com/r/VQkNze/1 that I've been trying to get to work in Tcl but cannot. Regular expressions tax my little brain so I'm likely doing something stupid. I've been trying in Tcl and found this regex web site searching through other SO questions; and tried my expression on the site in order to ask my question here and was surprised that it generated the desired result. So, I assume it has to do with a difference in Tcl or is a strange coincidence.

Would you please tell me what I'm doing wrong or overlooking? Thank you.

I tried the solution in this SO answer but couldn't get it to work in Tcl either.

I should have added that in Tcl I also tried:

regexp -all -inline {<span  id="V[[:digit:]] ">\
([[:digit:]]) ?&#160;<\/span>(?=. ?(<span |<\/div>))}

which separated the spans as desired; but, of course, does not capture the text because it is in the lookahead. But whatever I try to move the (. ?) for the text out of the lookahead, the spans are no longer separated as they are in the regex web site example.

CodePudding user response:

In Tcl regex, the laziness/greediness is set with the first greedy/lazy quantifier. You need to use

<span  id="V[[:digit:]] ?">([[:digit:]] ?)&#160;</span>(. ?)(?=<span |</div>)

to make it consistent with most other regex flavors, where V[[:digit:]] ? sets all quantifiers to lazy matching mode.

  • Related