Home > Back-end >  Regex with multiple lookarounds behaving differently when used with fixed width quantifier or a
Regex with multiple lookarounds behaving differently when used with fixed width quantifier or a

Time:01-07

Regex 1

([0-9])\1{2}(?<=\1)(?=\1)

Regex 2

([0-9])\1 (?<=\1)(?=\1)

My input is "43335".

I had an idea why I needed this, but as I was playing around with it, I noticed something strange.

I have a pretty goof understanding why the Regex 2 matches, but I absolutely can not understand why the Regex1 fails.

The difference in behaviour begins when the regex enters the negative lookahead (?=\1)

  1. In case of Regex 2 it sees that 5 is not equal to 3 and thus the position is stepped back(1 step) and the first lookaround is entered again (?<=\1)
  2. In case of Regex 1 it sees that 5 is not equal to 3 and the position is stepped all the way back to the beginning of the string for some reason.

Could you please help me understand why this happens?

CodePudding user response:

Regex #1:

  1. Look at the first character, is it a digit? Yes, it's a 4. Put the 4 into capture group 1. Now look at the next character. Is it a 4. No, it's a 3. No match. Move on to character #2.
  2. Look at the 2nd character, is it a digit? Yes, it's a 3. Put 3 into capture group 1. Are the next 2 characters also a 3? Yes, keep going. Now start the lookahead. Is it a 3 also? Ooh, no, it's a 5. This one fails too.
  3. Look at the 3rd character. Is it a digit? Yes, it's a 3. Put the 3 into capture group 1. Are the the next two digits also a 3. No, the first one is, but the second one is a 5. No match.
  4. ... similar no match for the rest of them.

Regex #2

  1. [Same as regex1 step 1] Look at the first character, is it a digit? Yes, it's a 4. Put the 4 into capture group 1. Now look at the next character. Is it a 4. No, it's a 3. No match. Move on to character #2.
  2. Put 2nd character (3) into capture group 1. The next 2 characters are the same, try to consume them all, then start the lookahead - is it a 3. No, it's a 5, no match. But now we can backtrack to the \1 and consume only one 3 instead of 2. Move forward again to the first lookahead. Is it a 3? Yes. Back up and do the second lookahead. Is it also a 3? Yes, and we're done, SUCCESS.

What gets matched are the first two 3s.

#1 does not work because you require 4 of the same digits in a row (with the lookahead).

#2 works because you give it the option of 3 consecutive digits as a match.

  • Related