I have string like below
string = 'https://somewebsite.com/itesr0824/products/YYA-002/fdrop-tQ?position=6'
How can I find
'https://somewebsite.com/itesr0824'
with regex?
I've tried
re.sub('[^https://somewebsite.com/[a-zA-Z0-9]. $','',string)
but it only finds
'https://somewebsite.com/itesr0824/products/YYA'
CodePudding user response:
Why on Earth would you use regular expressions for this when Python has a built-in URL parser? Don't re-invent the wheel and needlessly require accounting for all the weird edge cases URLs might present for you and instead use urllib.parse.urlparse()
and urllib.parse.urljoin
:
import urllib.parse
string = "https://somewebsite.com/itesr0824/products/YYA-002/fdrop-tQ?position=6"
parsedURL = urllib.parse.urlparse(string)
trimmedURL = urllib.parse.urljoin(parsedURL.scheme "://" parsedURL.netloc, parsedURL.path.split("/")[1]) # 'https://somewebsite.com/itesr0824'
CodePudding user response:
import re
string = 'https://somewebsite.com/itesr0824/products/YYA-002/fdrop-tQ?position=6'
x = re.sub('https://somewebsite\.com/\w ','',string)
# where:
# \w - matches any letter, digit or underscore. Equivalent to [a-zA-Z0-9_]
# - one or more
print(x)
Prints
/products/YYA-002/fdrop-tQ?position=6