I got a list of links and some of them look like
https://www.domainname
or https://domainname
I need to make a regex pattern to get only the domain name from it. This "www" make problems in my pattern :(
print(re.findall("//([a-zA-Z] )", i))
CodePudding user response:
You could use the end of the string.
url = "https://www.domainname"
url2 = "https://domainname"
for u in [url, url2]:
print(f'{u}')
print(re.findall(r"\w $", url2))
https://www.domainname
['domainname']
https://domainname
['domainname']
CodePudding user response:
My solution:
import re
l1 = ["https://www.domainname1", "https://domainname2"]
for i in l1:
print(re.findall("/(?:www\.)?(\w )", i))
Output:
['domainname1']
['domainname2']
CodePudding user response:
import re
with open('testfile.txt', 'r') as file:
readfile = file.read()
search = re.finditer('(?:\w :\/\/)?(?:\w \.)(\w )(\.\w )', readfile)
for check in search:
print(check.group(1)) #type 1 : if you want only domain names
result :
domainname
example