My web-application allows users to specify custom URI path components which comply with the following restrictions:
- All characters must be lowercase.
- Be at least 2 characters long.
- First character must match
[a-z]
. - The last character must match
[0-9a-z]
. - All other characters must match
[0-9a-z_\-]
. - The
-
and_
characters must not exist as a consecutive run of 2 or more.- i.e. The string must not contain
--
,__
,_-
, or-_
.
- i.e. The string must not contain
I've implemented the first 5 rules in a regular-expression easily enough:
^[a-z][0-9_a-z\-]*[0-9a-z]$
...however I don't know how to implement the last rule in a single regex.
I thought I'd start by just trying to change the regex so it won't match --
(as in a--b
) - and I was thinking it could be a negative-lookahead, as it's asserting that that regex does not contain --
(right?):
Lookahead and lookbehind, collectively called “lookaround”, are zero-length assertions just like the start and end of line, and start and end of word anchors. [...] The difference is that lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called “assertions”. They do not consume characters in the string, but only assert whether a match is possible or not
But adding (?!\-\-)
to the regular expression (on regex101.com) in various spots, or as a lookbehind (?<!\-\-)
does not cause strings like a--b
to not-match.
i.e. all of these patterns match foo--bar
when it shouldn't.
(?!\-\-)^[a-z][0-9_a-z\-]*[0-9a-z]$
^(?!\-\-)[a-z][0-9_a-z\-]*[0-9a-z]$
^[a-z](?!\-\-)[0-9_a-z\-]*[0-9a-z]$
^[a-z](?!\-\-)(?:[0-9_a-z\-]*)[0-9a-z]$
^[a-z][0-9_a-z\-]*(?!\-\-)[0-9a-z]$
^[a-z][0-9_a-z\-]*(?<!\-\-)[0-9a-z]$
CodePudding user response:
You can place the negative lookahead right after matching a-z at the start of the string.
As you don't want to match any combination of - and - you can use 2 character classes (?!.*[_-][_-])
As the [_-][_-]
part can occur anywhere in the string, you can precede it with .*
optionally matching any character.
If you omit .*
the assertion only runs on the current position, which in this case would be after matching the a-z at the start of the string.
^[a-z](?!.*[_-][_-])[0-9_a-z-]*[0-9a-z]$