Find repeating substring and what is the difference with `/(\w)\1 /g`and`/(\w\1 ) /g`-CodePudding

I'm trying to use Regex to find the longest repeating substring in a string. for example: when the input is "aabbccc", the expected output should be "ccc" (repeat 3 times) I finally get the answer by using str.match(/(\w)\1 /g). so when const str = 'aabbccc' and run str.match(/(\w)\1 /g) the output will be ['aa', 'bb', 'ccc']

but I was using str.match(/(\w\1 )/g) and the output is ['a', 'a', 'b', 'b', 'c', 'c', 'c']

Can someone explain why?

I thought : \w:any word , \1 :repeat one or more than one time \w\1 :any word repeat one or more than one time (\w\1 ): capture any repeat words as a group

CodePudding user response：

There is a description about this at esling.org no-useless-backreference

In JavaScript regular expressions, it’s syntactically valid to define a backreference to a group that belongs to another alternative part of the pattern, a backreference to a group that appears after the backreference, a backreference to a group that contains that backreference, or a backreference to a group that is inside a negative lookaround. However, by the specification, in any of these cases the backreference always ends up matching only zero-length (the empty string), regardless of the context in which the backreference and the group appear.

Backreferences that always successfully match zero-length and cannot match anything else are useless. They are basically ignored and can be removed without changing the behavior of the regular expression.

There is also a note at https://262.ecma-international.org/

NOTE 2

An escape sequence of the form \ followed by a non-zero decimal number n matches the result of the nth set of capturing parentheses (22.2.2.1). It is an error if the regular expression has fewer than n capturing parentheses. If the regular expression has n or more capturing parentheses but the nth one is undefined because it has not captured anything, then the backreference always succeeds.

So the pattern (\w\1 ) captures a single word character. As the backreference is ignored, the same result can be obtained by writing just \w

CodePudding user response：

(\w) matches a word character for the first capturing group, and \1 matches one of more occurrences of that first capturing group. This finds repeated characters.

In (\w\1 ), the \1 is inside the first capturing group, so it attempts to match one or more occurrences of the group it is contained in.

CodePudding user response：

1 . /(\w)\1 /g uses two sets of parentheses:

(\w), which take any word character and create a group .

\1 is a backreference to the first capture group(\w) and matches one or more repetitions of that captured character. quantifier means "one or more". This expression matches all repeating substrings in a string and return an array of strings.

/(\w\1 )/g use 2 sets of parentheses,

(\w) captures a single word character and 2nd one \1 match one or more same character as the last capture, but this one is not create a group and in this matched repeating characters as individual characters in array.