I want to get website URLs from text. I have got this regex so far.
((http|https):\/\/)?(www\.)[-a-zA-Z0-9@:%._\ ~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\ .~#?&//=]*)
Problem:
If I make www. optional then regex also detects vue.js or something.ts as URL and also detects emails as web URLs. If I make www. mandatory then it is not able to detect URLs like this.
mentioned regex works fine for my use case If I can make it more flexible to capture mentioned URLs.
Question:
I want to check if capture group that includes http or https have taken part in regex or not. If http is included in URL then make www. optional otherwise make it mandatory.
what is the possible solution to solve this problem?
CodePudding user response:
If I understand correctly you don't want to accept "example.com".
You can require that either the "http" (s) or the "www" part is present. So it no longer is optional, but one of the two is required. If both are actually provided, then the "www." part will be matched by the rest of the regular expression, so nothing more needs to be foreseen for that scenario.
Note that http|https
can be shortened to https?
.
Here is the regular expression with some tests.
const re = /(https?:\/\/|www\.)[-a-zA-Z0-9@:%._\ ~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\ .~#?&//=]*)/
const tests = [
"http://www.example.com",
"http://example.com",
"www.example.com",
"example.com",
]
for (let url of tests) {
console.log(re.test(url), url);
}