I am writing a regex to try and filter out invalid urls. This should be simple enough - a million examples are available online, I ended up using this one: ((https?|ftp|file)://)[-A-Za-z0-9 &@#/%?=~_|!:,.;] [-A-Za-z0-9 &@#/%=~_|]
.
However, our specific requirements state that the url must end in either "?" or "&". This should also be fairly simple, it can be done by adding (\\?|\\&)
to the end of the regex.
However, the requirements are further complicated by the following: if "?" is already present in the string, then the url must end in & and vice versa "with the main items in the preceding statement the other way around."
It should be noted that the regex written above and the general context of this question is within the javascript specifications.
Edit per the request of commenter
Examples of input urls:
No "?" or "&" at all:
https://helloworld.io/foobar
returns false
No "?" or "&" at end:
https://helloworld.io/foo&bar
returns false
https://helloworld.io/foo?bar
returns false
Single special character sound at end:
https://helloworld.io/foobar?
returns true
https://helloworld.io/foobar&
returns true
Alternating special characters in url:
https://helloworld.io/foo&bar?
returns true
https://helloworld.io/foo?bar&
returns true
Alternating special characters in url without unique ending:
https://helloworld.io/foo&bar?baz&
returns false
https://helloworld.io/foo?bar&baz?
returns false
Repeated special character found at end:
https://helloworld.io/foo?bar?
returns false
https://helloworld.io/foo&bar&
returns false
Alternating special characters with no special character at end:
https://helloworld.io/foo&bar?baz
returns false
https://helloworld.io/foo?bar?baz
returns false
Second edit in response to another comment:
With this regex most of my problems are solved:
((https?|ftp|file):\/\/)[-A-Za-z0-9 &@#/%?=~_|!:,.;] [-A-Za-z0-9 &@#/%=~_|](\\?|\\&)
However, I can not test for cases such as this:
https://helloworld.io/foo&bar?baz?bum&
This evaluates as valid, however, given that "&" is present in the string before the last char - it can not end with "&".
CodePudding user response:
You can use the following regex:
(https|ftp|file):\/\/[^\/] \/\w ((\?[^&\s] )?&|(&[^\?\s] )?\?)(\s|$)
Explanation:
(https|ftp|file)
: prefix:\/\/
: colon and double slash[^\\]
: anything other than next slash\/
: slash\w
: any alphanumeric character
Then there are two options.
Option 1: (\?[^&\s] )?&
:
(\?[^&\s] )?
: optional ? followed by any character other than &&
: &
Option 2: (&[^\?\s] )?\?)
:
(&[^\?\s] )?
: optional & followed by any character other than ?\?
: ?
Ending up with:
*(\s|$)
: space or endstring symbol
These will match the examples you provided. For more refinements, point to new examples.
Try it here.
CodePudding user response:
Working from your initial regex:
((https?|ftp|file)://)[-A-Za-z0-9 &@#/%?=~_|!:,.;] [-A-Za-z0-9 &@#/%=~_|]
Then modifying it for each case:
((https?|ftp|file)://)[-A-Za-z0-9 @#/%?=~_|!:,.;] [-A-Za-z0-9 @#/%=~_|]&
and
((https?|ftp|file)://)[-A-Za-z0-9 &@#/%=~_|!:,.;] [-A-Za-z0-9 &@#/%=~_|]\?
Then joining them and de-duplicating the common prefix:
((https?|ftp|file)://)([-A-Za-z0-9 @#/%?=~_|!:,.;] [-A-Za-z0-9 @#/%=~_|]&|[-A-Za-z0-9 &@#/%=~_|!:,.;] [-A-Za-z0-9 &@#/%=~_|]\?)
Adding ^
, $
, and the correct escaping for javascript, this would be:
^((https?|ftp|file):\/\/)([-A-Za-z0-9 @#\/%?=~_|!:,.;] [-A-Za-z0-9 @#\/%=~_|]&|[-A-Za-z0-9 &@#\/%=~_|!:,.;] [-A-Za-z0-9 &@#\/%=~_|]\?)$