Home > Software engineering >  How extract an incomplete url from string in c#
How extract an incomplete url from string in c#

Time:06-17

I am trying to extract some incomplete URLs from strings. Let me give you an example of what I mean by incomplete URL:

tny.sh/FJFCG8w
gka.co/cte3
google.com
cdn.ne/ecoe3

I have checked a bunch of solutions that use regex to detect the prefix like HTTP and stuff. but the above-mentioned links are links without prefixes. so does it possible to do it?

This is the method that I have tried to extract the URLs with it in a string:

protected LinkedList<string> ExtractLink(string txt)
{
    var linkParser = new Regex(@"\b(?:https?://|www\.)\S \b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
    LinkedList<string> urls = new LinkedList<string>();

    foreach (Match m in linkParser.Matches(txt))
        urls.AddFirst(m.Value);

    return urls;
    }

And this is an example of calling the method:

ExtractLink("Hello, this is the link that you need to check tny.sh/FJFCG8w");

CodePudding user response:

You can use this regex instead

[-a-zA-Z0-9@:%._\ ~#=]{2,256}\.[a-z]{2,6}([-a-zA-Z0-9@:%_\ .~#?&\/=])*

If you also want to match the urls with the http(s) protocol use this

(https?:\/\/)?[-a-zA-Z0-9@:%._\ ~#=]{2,256}\.[a-z]{2,6}([-a-zA-Z0-9@:%_\ .~#?&\/=])*
  • Related