Home > front end >  How can you match the doman and top level domain (example.com) using Regex?
How can you match the doman and top level domain (example.com) using Regex?

Time:07-19

I am trying to math all domain example and the extension .com (aka. top level domain) from a text that can include links but I fail completely as it matches the subdomain too and also things the domain is the extension sometimes.

Goal:

https://www.subdomain.example.com/folder/folder  -> example.com
example.com/folder/folder                        -> example.com
www.subdomain.example.com/folder/folder          -> example.com
example.com                                      -> example.com
www.example.com                                  -> example.com
subdomain.example.com                            -> example.com

Attempt 1:

(?:(?:www?).)?\b((xn--)?[a-z0-9] (-[a-z0-9] )*\.) [a-z]{2,}\b

Regex Demo

Attempt 2:

(?:(?:https?|ftp):\/\/)?[\w/\-?=%.] \.[\w/\-&?=%.] 

Regex Demo

CodePudding user response:

Something like this may work or be a starter: https://regex101.com/r/1UMjML/1 (updated regex slightly)

regex: (?<=https?://)(?:\w \.) (?<domain>\w \.\w )[/\s$]

CodePudding user response:

A simple solution is to match anything followed by a tld:

\w \.com

regex101.com

You can then make it more explicit by padding the beginning and end with whatever you want to match on e.g.:

(?:https:\/\/.*?)?(\w \.com)
  • Related