Home > OS >  Python Regex: Chain Multiple Substitutions In Function
Python Regex: Chain Multiple Substitutions In Function

Time:10-21

I am looking to clean URL's in the else block of an if. Specifically, strip the ? and all query parameters after it as well as everything before the first "/".

Example Input = 'somesite.com/somepage?param=1&else=2'

Example Output = 'somepage'

** All that is left is our page (no query params and no domain) **

Below is what I have so far (not working). I was focused on piecing this out and the below was an attempt on stripping all query parameters. I'm not sure how I would chain both together.

def new_url_check(x):
    
    if 'some condition' in x:
        x = 'some random condition'           
            
    else:
        re.sub(r'^([^?] )', '', x)
            
    return x

CodePudding user response:

You'll have to assign to x or return the result of your re.sub call, and the regex should include some requirements that concern the forward slash. Also there is the # symbol that has a special meaning:

x = re.sub(r'^.*/|[?#].*', '', x)

If you want to keep the first part of the path instead of the last, then:

x = re.sub(r'^.*?/ .*?/|[/?#].*', '', x)

This assumes that the host is included in the input, and starts with at least a forward slash. So it will work for the following:

http://localhost/abc/def/ghi  => abc
/localhost/abc/def/ghi => abc

CodePudding user response:

howabout using re.search and setting what you want aside as a group?

re.search(r'.*\.com\/(.*)\?.*', x).group(1)
  • Related