Python regex pattern which search for domain name-CodePudding

I got a list of links and some of them look like

https://www.domainname
or https://domainname

I need to make a regex pattern to get only the domain name from it. This "www" make problems in my pattern :(

print(re.findall("//([a-zA-Z] )", i))

CodePudding user response：

You could use the end of the string.

url = "https://www.domainname"
url2 = "https://domainname"


for u in [url, url2]:
    print(f'{u}')
    print(re.findall(r"\w $", url2))

https://www.domainname
['domainname']
https://domainname
['domainname']

CodePudding user response：

My solution:

import re

l1 = ["https://www.domainname1", "https://domainname2"]
for i in l1:
    print(re.findall("/(?:www\.)?(\w )", i))

Output:

['domainname1']
['domainname2']

CodePudding user response：

import re

with open('testfile.txt', 'r') as file:
    readfile = file.read()

    search = re.finditer('(?:\w :\/\/)?(?:\w \.)(\w )(\.\w )', readfile)

    for check in search:
        print(check.group(1)) #type 1 : if you want only domain names

result :

domainname
example