I have a data set that contains the url to web pages. I have something like this :
I am trying to replace all the urls by what's after the last "/" so I'm simply using
df["url"].str.split("/").str[-1]
But I'd like for the urls that have a long string composed of both letters and numbers (like the third link) to be replaced by "valid" and those that have nothing at the end after the last "/" to be replaced by "home_page", how do I achieve that?
I'd like to have something like this:
- login
- home_page
- valid
CodePudding user response:
You can parse url with this. Then you can replace.
CodePudding user response:
Please find the code to achieve the above requirement:
urls = ["https://abc.eu/login", "https://abc.eu/", "https://abc.eu/ar35gjdb4"]
for url in urls:
s = url.rsplit('/', 1)
if s[1] == 'login':
print(os.path.join(s[0], 'login'))
elif s[1] == '':
print(os.path.join(s[0], 'home_page'))
elif re.match('^[a-zA-Z0-9_] $', s[1]):
print(os.path.join(s[0], 'valid'))