Home > Blockchain >  Regex, what's true meaning of [^a]
Regex, what's true meaning of [^a]

Time:03-26

String Regex result row
ccaaabb [^a] cc 1
aaabb [^a] bb 2
AbbbAcc [^A] b 3
AbbbAcc [^A]? (empty) 4
AbbbAcc [^A]* (empty) 5
AbbbAcc ([^A] ){1} bbb 6
AbbbAcc ([^A] ){2} b 7

row 1: From left to right, cc is matched and output. bb is not. why?

row 2: the engine seems to skip every a then reached bb matched and output. in row 1, the a's are not skipped and the b's are not reached. why such difference?

row 3 -5 : why such difference?

row 6 -7 : why such difference?

Please explain, character by character and space by space, the working process of the engine.

The result is obtained from testing the strings and regex on https://onlinetexttools.com/extract-regex-matches-from-text

CodePudding user response:

  1. A regular expression matches a contiguous substring. So it starts matching the first c and stops when it gets to a, and the match is just cc. bb is not included because it's separated from cc.

  2. It skips over the a characters because they don't match, and matches bb after it.

  3. It matches the first character that isn't A, so it matches the first b.

  4. Since you've made the pattern optional with ?, it will match an empty string. It matches the empty string at the beginning of the string, and returns an empty match.

  5. * means zero or more matches of the pattern. There are zero matches at the very beginning, so it returns that empty string.

6-7 You seem to be printing what the capture group matched, not the entire regexp. When you quantify a capture group, it captures the last repetition.

In 6, the capture group only has to match 1 time, so it can capture everything that [^A] matches, which is bbb.

But in 7, the capture group has to match twice. If the first repetition matched bbb, there would be nothing for the second repetition to match, so it backtracks. Now the first repetition matches bb and the second matches b, and the latter is the value of the capture group.

CodePudding user response:

  1. ccbb is not a substring of ccaaabb, so it can't possibly be matched.

  2. It fails to match starting at position 0, so it tries starting at subsequent positions until it finally finds a match at position 3.

  3. It fails to match starting at position 0, so it tries starting at subsequent positions until it finally finds a match at position 1.

  4. It finds a matching substring at position 0. It doesn't need to look further.

  5. It finds a matching substring at position 0. It doesn't need to look further.

  6. It fails to match starting at position 0, so it tries starting at subsequent positions until it finally finds a match at position 1.

  7. It fails to match starting at position 0, so it tries starting at subsequent positions until it finally finds a match at position 1. It actually matches bbb, not b as you claim.

    In the first attempt, [^A] first matches bbb. But then [^A] can't match a second time. So it tries matching less.

    In the second attempt, [^A] first matches bb then b. The last match is what's captured.

  • Related