I was trying to check if 2 urls have same domain using prase.urlparse like this:
if parse.urlparse(url1).hostname == parse.urlparse(url2).hostname:
print('Same Domain')
else:
print('Different Domain')
While this works for: https://example.com/page1
and https://example.com/page1
it doesn't work for: https://sub1.example.com/page1
and https://sub2.example.com/page1
As the returned hostname for first url is sub1.example.com
and for second url is https://sub2.example.com/page1
...
What if I only care about main domain and not subdomains?
I want to retrive example.com
alone without manually parsing the url as a string but rather use libraries for such cases like parse.urlprase.
CodePudding user response:
You cannot.
Reason is explained here
In a nutshell, there's no way of knowing what part of sub2.domain.tld is the first level domain, domain name or subdomain, just like when you have
example.co.uk
or similiar