Regex 1
([0-9])\1{2}(?<=\1)(?=\1)
Regex 2
([0-9])\1 (?<=\1)(?=\1)
My input is "43335".
I had an idea why I needed this, but as I was playing around with it, I noticed something strange.
I have a pretty goof understanding why the Regex 2
matches, but I absolutely can not understand why the Regex1
fails.
The difference in behaviour begins when the regex enters the negative lookahead (?=\1)
- In case of
Regex 2
it sees that 5 is not equal to 3 and thus the position is stepped back(1 step) and the first lookaround is entered again(?<=\1)
- In case of
Regex 1
it sees that 5 is not equal to 3 and the position is stepped all the way back to the beginning of the string for some reason.
Could you please help me understand why this happens?
CodePudding user response:
Regex #1:
- Look at the first character, is it a digit? Yes, it's a 4. Put the 4 into capture group 1. Now look at the next character. Is it a 4. No, it's a 3. No match. Move on to character #2.
- Look at the 2nd character, is it a digit? Yes, it's a 3. Put 3 into capture group 1. Are the next 2 characters also a 3? Yes, keep going. Now start the lookahead. Is it a 3 also? Ooh, no, it's a 5. This one fails too.
- Look at the 3rd character. Is it a digit? Yes, it's a 3. Put the 3 into capture group 1. Are the the next two digits also a 3. No, the first one is, but the second one is a 5. No match.
- ... similar no match for the rest of them.
Regex #2
- [Same as regex1 step 1] Look at the first character, is it a digit? Yes, it's a 4. Put the 4 into capture group 1. Now look at the next character. Is it a 4. No, it's a 3. No match. Move on to character #2.
- Put 2nd character (3) into capture group 1. The next 2 characters are the same, try to consume them all, then start the lookahead - is it a 3. No, it's a 5, no match. But now we can backtrack to the
\1
and consume only one 3 instead of 2. Move forward again to the first lookahead. Is it a 3? Yes. Back up and do the second lookahead. Is it also a 3? Yes, and we're done, SUCCESS.
What gets matched are the first two 3s.
#1 does not work because you require 4 of the same digits in a row (with the lookahead).
#2 works because you give it the option of 3 consecutive digits as a match.