Home > Software engineering >  Regex: add exception (exact match) to existing regex
Regex: add exception (exact match) to existing regex

Time:07-20

I have this regex: (. ?)(?:index\.html?|\.html?)(.*)?$

This is used (case-insensitive) to trigger redirects for all URLs which contain "index.html" or ".html". This part is removed by triggering a redirect with the two matching groups and leaving out the middle part (which is either "index.html" or ".html").

Example input URL: https://www.example.com/somePath/subPath/index.Html?someQueryString

This will be redirected to: https://www.example.com/somePath/subPath/?someQueryString

This is all working as expected, but now I want to add an exception to this regex. The exception is that this redirect should not be triggered if a certain word exists in the first group.

Let's say this word is "safePath". The following URL should not trigger a redirect, because it contains the word "safePath" in the first group: https://www.example.com/safePath/subPath/index.Html?someQueryString

How can I change my regex expression to honor this exception?

CodePudding user response:

This seems to work, though it returns three capturing groups instead of two (it splits the first part of the link into two groups):

(https://www.example.com/)(?!safePath/)(. ?)(?:index\.html?|\.html?)(.*)?$

(Note: This only matches if safePath is the first group in the link as specified in the question. So a link like https://www.example.com/somePath/safePath/index.html?someQueryString will still match because safePath is in the second group.)

Test code in Python

EDIT: I think this should detect whether safePath is anywhere in the URL:

^(?!.*/safePath/)(. ?)(?:index\.html?l\.html?)(.*)?$

CodePudding user response:

The following URL should not trigger a redirect, because it contains the word "safePath" in the first group: https://www.example.com/safePath/subPath/index.Html?someQueryString

On way you can probably do this with a negative lookahead in the regex:

(?!.*safepath)(. ?)(?:index\.html?|\.html?)(.*)?$
   ^^^^^^

...Or, potentially, you could make this a separate conditional - i.e. something like:

if(url.contains?('safepath')) {
  return false;
}

(This is only intended as pseudo-code, since you didn't specify a language/tools being used.)

  • Related