Home > Mobile >  How can I use a negative lookahead in an anchored regular-expression pattern?
How can I use a negative lookahead in an anchored regular-expression pattern?

Time:06-16

My web-application allows users to specify custom URI path components which comply with the following restrictions:

  • All characters must be lowercase.
  • Be at least 2 characters long.
  • First character must match [a-z].
  • The last character must match [0-9a-z].
  • All other characters must match [0-9a-z_\-].
  • The - and _ characters must not exist as a consecutive run of 2 or more.
    • i.e. The string must not contain --, __, _-, or -_.

I've implemented the first 5 rules in a regular-expression easily enough:

^[a-z][0-9_a-z\-]*[0-9a-z]$

...however I don't know how to implement the last rule in a single regex.

I thought I'd start by just trying to change the regex so it won't match -- (as in a--b) - and I was thinking it could be a negative-lookahead, as it's asserting that that regex does not contain -- (right?):

Lookahead and lookbehind, collectively called “lookaround”, are zero-length assertions just like the start and end of line, and start and end of word anchors. [...] The difference is that lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called “assertions”. They do not consume characters in the string, but only assert whether a match is possible or not

But adding (?!\-\-) to the regular expression (on regex101.com) in various spots, or as a lookbehind (?<!\-\-) does not cause strings like a--b to not-match.

i.e. all of these patterns match foo--bar when it shouldn't.

(?!\-\-)^[a-z][0-9_a-z\-]*[0-9a-z]$

^(?!\-\-)[a-z][0-9_a-z\-]*[0-9a-z]$

^[a-z](?!\-\-)[0-9_a-z\-]*[0-9a-z]$

^[a-z](?!\-\-)(?:[0-9_a-z\-]*)[0-9a-z]$

^[a-z][0-9_a-z\-]*(?!\-\-)[0-9a-z]$

^[a-z][0-9_a-z\-]*(?<!\-\-)[0-9a-z]$

CodePudding user response:

You can place the negative lookahead right after matching a-z at the start of the string.

As you don't want to match any combination of - and - you can use 2 character classes (?!.*[_-][_-])

As the [_-][_-] part can occur anywhere in the string, you can precede it with .* optionally matching any character.

If you omit .* the assertion only runs on the current position, which in this case would be after matching the a-z at the start of the string.

^[a-z](?!.*[_-][_-])[0-9_a-z-]*[0-9a-z]$
  • Related