Home > database >  Testing for an [n]th element when using split('/') on a URL, without it giving a "lis
Testing for an [n]th element when using split('/') on a URL, without it giving a "lis

Time:03-01

Python 3.10.2

I have a URL that generally appears as follows, with some slight variation (http/https, www. prefix sometimes, #params at the end to indicate things like referrer or device being displayed on, etc).

https://madeupdomain.net/u/Hypothetical_Username/Some-Random-Page-Name

The forms of the URL I'm generally encountering are either:

https://madeupdomain.net/u/Hypothetical_Username/

or

https://madeupdomain.net/u/Hypothetical_Username/Some-Random-Page-Name

What I'm interested in doing with the URL:

  1. Get the Hypothetical_Username part
  2. Find out if the URL stops at the username or if there is another /path after it

I've been using user = url.split('/')[4] to get the username portion of the URL. Since the URL always includes the username and the URL is usually consistent (for now), I can rely on this split getting the element I want. If the URL changes a little bit in the future, I know this'll bone me.

However, the rest of the path is optional.

If I just use url.split('/')[5], python throws an error as soon as it encounters a URL where split doesn't have a [5]th element.

So I tired to "test" for it with an if statement and it still complains and throws the error IndexError: list index out of range.

if url.split('/')[5]:
    continue

When I print out the list, it'll look like either of the following. As you can see, there are 5 elements in the first and six in the second.

['https:', '', 'madeupdomain.net', 'u', 'Hypothetical_Username']

['https:', '', 'madeupdomain.net', 'u', 'Hypothetical_Username', 'Some-Random-Page-Name']

So, I tried running len(url.split('/')) on every iteration, to see how many elemnts each list has and it always says 6 - whether it is the first or second example, above.

So, I'm kind of at a loss here as to a very simple and clean way to do what I want to do. I know there are url parsing libraries, but that seems like overkill for what I want to do (get the username, then find out if there is a path name beyond that and decide what to do with the URL once I know).

Would really appreciate any guidance here. I know I'm just bashing my head against something really simple.

Thanks for your input.

Solutions Both @Desktop-Firework and @Kaushal-Sharma's solutions worked well, in different ways. I also wanted to add the simplest way to do what I was originally trying to do once I got it to work based on their answers. It's obvious to anyone above my level of experience with Python, but maybe it'll help someone in my situation down the line.

I was simply doing an "if" to check if an index point existed, when I obviously should have been using a try-except.

So, using my original code, I could solve what I needed by simply changing:

if url.split('/')[5]:
    continue

into

isPath =  1
try: link.split("/")[5]
except IndexError: isPath = 0

Just adding this as it directly answers what I was trying to do at its most basic element. It is not as robust or elegant as either of the provided solutions from the other contributors, obviously.

CodePudding user response:

I suggest getting the index of the /u/, then taking every single character after the /u/ and before the / as part of the username, and then trying to get the character after the / after the username. If there's an IndexError, then there's no path after the username; if there isn't, there is a path.

So I propose something like this:

def getUserName(url):
   userStart = url.index('/u/')   3
   urlIdx = userStart
   userName = ''
   while url[urlIdx] != '/':
      userName  = url[urlIdx]
      urlIdx  = 1
   urlIdx  = 1
   isPath = 1
   try: url[urlIdx]
   except IndexError: isPath = 0
   return (userName, isPath)

It returns a tuple which first element is the username and the second is wether there is a path after the username or not. But in the case of https://www.example.net/u/username/ it works only if there is a / after the username.

CodePudding user response:

You can split the url at '/u/' and then split the last part with '/' to get the username and the path after that.

# case 1:
url = 'https://madeupdomain.net/u/Hypothetical_Username/Some-Random-Page-Name'

split_url = url.split('/u/')[-1].split('/')
hyp_username_part = split_url[0]
another_path_part = split_url[-1] if len(split_url) == 2 else None


print('username part: ', hyp_username_part, 'path part: ', another_path_part)


# case 2:
url = 'https://madeupdomain.net/u/Hypothetical_Username'

split_url = url.split('/u/')[-1].split('/')
hyp_username_part = split_url[0]
another_path_part = split_url[-1] if len(split_url) == 2 else None


print('username part: ', hyp_username_part, 'path part: ', another_path_part)

Output:

username part:  Hypothetical_Username path part:  Some-Random-Page-Name
username part:  Hypothetical_Username path part:  None
  • Related