I'm referring to ECMAScript regular expression syntax defined in https://tc39.es/ecma262/#sec-regexp-regular-expression-objects.
I checked how the following pattern matches in a regular expression via several online sources.
Pattern: /[[]]/
They all included that
[ -> Match any character in the character set
[ -> Matches a `[` character
]
] -> Matches a `]` character
I get how the character set is matched, but I don't understand why the last closing bracket(]
) is matched. Isn't it a syntax error in the regular expression since a PatternCharacter
can only be a SourceCharacter
that is not a SyntaxCharacter
according to the syntax defined ECMAScript specification (https://tc39.es/ecma262/#prod-PatternCharacter)? The closing bracket(]
) is a SyntaxCharacter
.
PatternCharacter ::
SourceCharacter but not SyntaxCharacter
CodePudding user response:
In the annex B Additional ECMAScript Features for Web Browsers, section B.1.2 Regular Expressions Patterns of the same specification, it says:
The syntax of 22.2.1 is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.
And there we find that a Term can be an ExtendedAtom, which in turn can be an ExtendedPatternCharacter, which is defined as:
SourceCharacter but not one of
^ $ \ . * ? ( ) [ |
So here ]
is allowed.
The annex B of this specification is introduced with:
The ECMAScript language syntax and semantics defined in this annex are required when the ECMAScript host is a web browser. The content of this annex is normative but optional if the ECMAScript host is not a web browser.
It is interesting that this "extra" behaviour is also provided in NodeJS, even though it would not have had to according to this specification.