Home > Back-end >  How to replace the last string of an url by another string
How to replace the last string of an url by another string

Time:09-17

I have a data set that contains the url to web pages. I have something like this :

I am trying to replace all the urls by what's after the last "/" so I'm simply using

df["url"].str.split("/").str[-1]

But I'd like for the urls that have a long string composed of both letters and numbers (like the third link) to be replaced by "valid" and those that have nothing at the end after the last "/" to be replaced by "home_page", how do I achieve that?

I'd like to have something like this:

  • login
  • home_page
  • valid

CodePudding user response:

You can parse url with this. Then you can replace.

CodePudding user response:

Please find the code to achieve the above requirement:

urls = ["https://abc.eu/login", "https://abc.eu/", "https://abc.eu/ar35gjdb4"]

for url in urls:
    s = url.rsplit('/', 1)
    if s[1] == 'login':
        print(os.path.join(s[0], 'login'))
    elif s[1] == '': 
        print(os.path.join(s[0], 'home_page'))
    elif re.match('^[a-zA-Z0-9_] $', s[1]):
        print(os.path.join(s[0], 'valid'))
  • Related