Python 3.10.2
I have a URL that generally appears as follows, with some slight variation (http/https, www. prefix sometimes, #params at the end to indicate things like referrer or device being displayed on, etc).
https://madeupdomain.net/u/Hypothetical_Username/Some-Random-Page-Name
The forms of the URL I'm generally encountering are either:
https://madeupdomain.net/u/Hypothetical_Username/
or
https://madeupdomain.net/u/Hypothetical_Username/Some-Random-Page-Name
What I'm interested in doing with the URL:
- Get the
Hypothetical_Username
part - Find out if the URL stops at the username or if there is another
/path
after it
I've been using user = url.split('/')[4]
to get the username portion of the URL. Since the URL always includes the username and the URL is usually consistent (for now), I can rely on this split getting the element I want. If the URL changes a little bit in the future, I know this'll bone me.
However, the rest of the path is optional.
If I just use url.split('/')[5]
, python throws an error as soon as it encounters a URL where split doesn't have a [5]
th element.
So I tired to "test" for it with an if statement and it still complains and throws the error IndexError: list index out of range
.
if url.split('/')[5]:
continue
When I print out the list, it'll look like either of the following. As you can see, there are 5 elements in the first and six in the second.
['https:', '', 'madeupdomain.net', 'u', 'Hypothetical_Username']
['https:', '', 'madeupdomain.net', 'u', 'Hypothetical_Username', 'Some-Random-Page-Name']
So, I tried running len(url.split('/'))
on every iteration, to see how many elemnts each list has and it always says 6 - whether it is the first or second example, above.
So, I'm kind of at a loss here as to a very simple and clean way to do what I want to do. I know there are url parsing libraries, but that seems like overkill for what I want to do (get the username, then find out if there is a path name beyond that and decide what to do with the URL once I know).
Would really appreciate any guidance here. I know I'm just bashing my head against something really simple.
Thanks for your input.
Solutions Both @Desktop-Firework and @Kaushal-Sharma's solutions worked well, in different ways. I also wanted to add the simplest way to do what I was originally trying to do once I got it to work based on their answers. It's obvious to anyone above my level of experience with Python, but maybe it'll help someone in my situation down the line.
I was simply doing an "if" to check if an index point existed, when I obviously should have been using a try-except.
So, using my original code, I could solve what I needed by simply changing:
if url.split('/')[5]:
continue
into
isPath = 1
try: link.split("/")[5]
except IndexError: isPath = 0
Just adding this as it directly answers what I was trying to do at its most basic element. It is not as robust or elegant as either of the provided solutions from the other contributors, obviously.
CodePudding user response:
I suggest getting the index of the /u/
, then taking every single character after the /u/
and before the /
as part of the username, and then trying to get the character after the /
after the username. If there's an IndexError
, then there's no path after the username; if there isn't, there is a path.
So I propose something like this:
def getUserName(url):
userStart = url.index('/u/') 3
urlIdx = userStart
userName = ''
while url[urlIdx] != '/':
userName = url[urlIdx]
urlIdx = 1
urlIdx = 1
isPath = 1
try: url[urlIdx]
except IndexError: isPath = 0
return (userName, isPath)
It returns a tuple which first element is the username and the second is wether there is a path after the username or not. But in the case of https://www.example.net/u/username/
it works only if there is a /
after the username.
CodePudding user response:
You can split the url at '/u/' and then split the last part with '/' to get the username and the path after that.
# case 1:
url = 'https://madeupdomain.net/u/Hypothetical_Username/Some-Random-Page-Name'
split_url = url.split('/u/')[-1].split('/')
hyp_username_part = split_url[0]
another_path_part = split_url[-1] if len(split_url) == 2 else None
print('username part: ', hyp_username_part, 'path part: ', another_path_part)
# case 2:
url = 'https://madeupdomain.net/u/Hypothetical_Username'
split_url = url.split('/u/')[-1].split('/')
hyp_username_part = split_url[0]
another_path_part = split_url[-1] if len(split_url) == 2 else None
print('username part: ', hyp_username_part, 'path part: ', another_path_part)
Output:
username part: Hypothetical_Username path part: Some-Random-Page-Name
username part: Hypothetical_Username path part: None