I am looking to clean URL's in the else block of an if. Specifically, strip the ? and all query parameters after it as well as everything before the first "/".
Example Input = 'somesite.com/somepage?param=1&else=2'
Example Output = 'somepage'
** All that is left is our page (no query params and no domain) **
Below is what I have so far (not working). I was focused on piecing this out and the below was an attempt on stripping all query parameters. I'm not sure how I would chain both together.
def new_url_check(x):
if 'some condition' in x:
x = 'some random condition'
else:
re.sub(r'^([^?] )', '', x)
return x
CodePudding user response:
You'll have to assign to x
or return
the result of your re.sub
call, and the regex should include some requirements that concern the forward slash. Also there is the #
symbol that has a special meaning:
x = re.sub(r'^.*/|[?#].*', '', x)
If you want to keep the first part of the path instead of the last, then:
x = re.sub(r'^.*?/ .*?/|[/?#].*', '', x)
This assumes that the host is included in the input, and starts with at least a forward slash. So it will work for the following:
http://localhost/abc/def/ghi => abc
/localhost/abc/def/ghi => abc
CodePudding user response:
howabout using re.search and setting what you want aside as a group?
re.search(r'.*\.com\/(.*)\?.*', x).group(1)