Home > Mobile >  Regex: Check if character exists in string and adjust rules accordingly
Regex: Check if character exists in string and adjust rules accordingly

Time:05-13

I am writing a regex to try and filter out invalid urls. This should be simple enough - a million examples are available online, I ended up using this one: ((https?|ftp|file)://)[-A-Za-z0-9 &@#/%?=~_|!:,.;] [-A-Za-z0-9 &@#/%=~_|].

However, our specific requirements state that the url must end in either "?" or "&". This should also be fairly simple, it can be done by adding (\\?|\\&) to the end of the regex.

However, the requirements are further complicated by the following: if "?" is already present in the string, then the url must end in & and vice versa "with the main items in the preceding statement the other way around."

It should be noted that the regex written above and the general context of this question is within the javascript specifications.

Edit per the request of commenter

Examples of input urls:

No "?" or "&" at all:

https://helloworld.io/foobar returns false

No "?" or "&" at end:

https://helloworld.io/foo&bar returns false

https://helloworld.io/foo?bar returns false

Single special character sound at end:

https://helloworld.io/foobar? returns true

https://helloworld.io/foobar& returns true

Alternating special characters in url:

https://helloworld.io/foo&bar? returns true

https://helloworld.io/foo?bar& returns true

Alternating special characters in url without unique ending:

https://helloworld.io/foo&bar?baz& returns false

https://helloworld.io/foo?bar&baz? returns false

Repeated special character found at end:

https://helloworld.io/foo?bar? returns false

https://helloworld.io/foo&bar& returns false

Alternating special characters with no special character at end:

https://helloworld.io/foo&bar?baz returns false

https://helloworld.io/foo?bar?baz returns false

Second edit in response to another comment:

With this regex most of my problems are solved:

((https?|ftp|file):\/\/)[-A-Za-z0-9 &@#/%?=~_|!:,.;] [-A-Za-z0-9 &@#/%=~_|](\\?|\\&)

However, I can not test for cases such as this:

https://helloworld.io/foo&bar?baz?bum&

This evaluates as valid, however, given that "&" is present in the string before the last char - it can not end with "&".

CodePudding user response:

You can use the following regex:

(https|ftp|file):\/\/[^\/] \/\w ((\?[^&\s] )?&|(&[^\?\s] )?\?)(\s|$)

Explanation:

  • (https|ftp|file): prefix
  • :\/\/: colon and double slash
  • [^\\] : anything other than next slash
  • \/: slash
  • \w : any alphanumeric character

Then there are two options.

Option 1: (\?[^&\s] )?&:

  • (\?[^&\s] )?: optional ? followed by any character other than &
  • &: &

Option 2: (&[^\?\s] )?\?):

  • (&[^\?\s] )?: optional & followed by any character other than ?
  • \?: ?

Ending up with: *(\s|$): space or endstring symbol

These will match the examples you provided. For more refinements, point to new examples.

Try it here.

CodePudding user response:

Working from your initial regex:

((https?|ftp|file)://)[-A-Za-z0-9 &@#/%?=~_|!:,.;] [-A-Za-z0-9 &@#/%=~_|]

Then modifying it for each case:

((https?|ftp|file)://)[-A-Za-z0-9 @#/%?=~_|!:,.;] [-A-Za-z0-9 @#/%=~_|]&

and

((https?|ftp|file)://)[-A-Za-z0-9 &@#/%=~_|!:,.;] [-A-Za-z0-9 &@#/%=~_|]\?

Then joining them and de-duplicating the common prefix:

((https?|ftp|file)://)([-A-Za-z0-9 @#/%?=~_|!:,.;] [-A-Za-z0-9 @#/%=~_|]&|[-A-Za-z0-9 &@#/%=~_|!:,.;] [-A-Za-z0-9 &@#/%=~_|]\?)

Adding ^, $, and the correct escaping for javascript, this would be:

^((https?|ftp|file):\/\/)([-A-Za-z0-9 @#\/%?=~_|!:,.;] [-A-Za-z0-9 @#\/%=~_|]&|[-A-Za-z0-9 &@#\/%=~_|!:,.;] [-A-Za-z0-9 &@#\/%=~_|]\?)$

Tests over on regex101

  • Related