I'm trying to use Regex to find the longest repeating substring in a string.
for example: when the input is "aabbccc", the expected output should be "ccc" (repeat 3 times)
I finally get the answer by using str.match(/(\w)\1 /g)
.
so when const str = 'aabbccc'
and run str.match(/(\w)\1 /g)
the output will be ['aa', 'bb', 'ccc']
but I was using str.match(/(\w\1 )/g)
and the output is ['a', 'a', 'b', 'b', 'c', 'c', 'c']
Can someone explain why?
I thought :
\w
:any word ,
\1
:repeat one or more than one time
\w\1
:any word repeat one or more than one time
(\w\1 )
: capture any repeat words as a group
CodePudding user response:
There is a description about this at esling.org no-useless-backreference
In JavaScript regular expressions, it’s syntactically valid to define a backreference to a group that belongs to another alternative part of the pattern, a backreference to a group that appears after the backreference, a backreference to a group that contains that backreference, or a backreference to a group that is inside a negative lookaround. However, by the specification, in any of these cases the backreference always ends up matching only zero-length (the empty string), regardless of the context in which the backreference and the group appear.
Backreferences that always successfully match zero-length and cannot match anything else are useless. They are basically ignored and can be removed without changing the behavior of the regular expression.
There is also a note at https://262.ecma-international.org/
NOTE 2
An escape sequence of the form \ followed by a non-zero decimal number n matches the result of the nth set of capturing parentheses (22.2.2.1). It is an error if the regular expression has fewer than n capturing parentheses. If the regular expression has n or more capturing parentheses but the nth one is undefined because it has not captured anything, then the backreference always succeeds.
So the pattern (\w\1 )
captures a single word character. As the backreference is ignored, the same result can be obtained by writing just \w
CodePudding user response:
(\w)
matches a word character for the first capturing group, and \1
matches one of more occurrences of that first capturing group. This finds repeated characters.
In (\w\1 )
, the \1
is inside the first capturing group, so it attempts to match one or more occurrences of the group it is contained in.
CodePudding user response:
1 . /(\w)\1 /g
uses two sets of parentheses:
(\w)
, which take any word character and create a group .
\1
is a backreference to the first capture group(\w)
and matches one or more repetitions of that captured character.
quantifier means "one or more". This expression matches all repeating substrings in a string and return an array of strings.
/(\w\1 )/g
use 2 sets of parentheses,
(\w)
captures a single word character and 2nd one \1
match one or more same character as the last capture,
but this one is not create a group and in this matched repeating characters as individual characters in array.