Home > Enterprise >  How Can I Use Regex to Match All the Urls in This String?
How Can I Use Regex to Match All the Urls in This String?

Time:01-29

I hope to use the following code to get all URLs from a string.

But I only the three URLs ,there are http://www.google.com, https://www.twitter.com and www.msn.com.

I hope I can get all URLs include bing.com in the result, how can I modifty the var expression = /(https?:\/\/(?:www\.| ... ?

function openURLs() {
    let links = "http://www.google.com  Hello https://www.twitter.com The  www.msn.com  World bing.com";   

    if (links) {
        var expression = /(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-] [a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-] [a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9] \.[^\s]{2,}|www\.[a-zA-Z0-9] \.[^\s]{2,})/gi;

        var url_array = links.match(expression);

        if (url_array != null) {
            url_array.forEach((url) => {
                urlOK = url.match(/^https?:/) ? url : '//'   url;
                window.open(urlOK)
            });
        }
    }
}

CodePudding user response:

Going off of what you currently have, you can just append |[a-zA-Z0-9] \.[^\s]{2,} to the end of your expression. The resulting line will look like this:

var expression = /(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-] [a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-] [a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9] \.[^\s]{2,}|www\.[a-zA-Z0-9] \.[^\s]{2,})|[a-zA-Z0-9] \.[^\s]{2,}/gi;

This could be cleaner, but it'll do what you're asking.

Edit:

If you're okay with something slightly more permissive that can pull the same URLs out, you can try this expression:

var expression = /(?:https?:\/\/)?(?:www\.)?[\w.-] \.\S{2,}/gi;

CodePudding user response:

A permissive regular expression may be the following:

var expression = /(https?:\/\/)?[a-zA-Z0-9] \.[a-zA-Z0-9] \S*/

This expression is simpler and easier to debug. Furthermore, it will match any website, including the ones with query params (example.com?param=value) or with no ASCII characters (example.com/你好).

Here you can see a test.

On the other hand, it will match things that aren't websites as soon as they contain a dot, so things like foo.bar will be matched. However, there is no reliable way to detect whether strings like foo.bar are actually websites.

  • Related