Home > other >  cannot extract ID correctly from URL using split operation
cannot extract ID correctly from URL using split operation

Time:06-07

I was using the standard split operation in python to extract ids from urls. It works for urls of the form https://music.com/146 where I need to extract 146 but fails in these cases

https://music.com/144?i=150

from where I need to extract 150 after i I use the standard

url.split("/")[-1]

Is there a better way to do it ?

CodePudding user response:

Python provides a few tools to make this process easier.

As @Barmar mentioned, you can use urlsplit to split the URL, which gets you a named tuple:

>>> from urllib import parse as urlparse
>>> x = urlparse.urlsplit('https://music.com/144?i=150')
>>> x
SplitResult(scheme='https', netloc='music.com', path='/144', query='i=150', fragment='')

You can use the parse_qs function to convert the query string into a dictionary:

>>> urlparse.parse_qs(x.query)
{'i': ['150']}

Or in a single line:

>>> urlparse.parse_qs(urlparse.urlsplit('https://music.com/144?i=150').query)['i']
['150']

CodePudding user response:

As @Barmar mentioned, you can fix your code to:

url.split("/")[-1].split("?i=")[-1]

Basically you need to split https://music.com/144?i=150 into https://music.com and 144?i=150, get the second element 144?i=150, then split it to 144 and 150, then get the second.

If you need it to be number, you can use int(url.split("/")[-1].split("?i="))[-1]

CodePudding user response:

you can use regexp

import re
url = 'https://music.com/144?i=150'
match = re.search(r'(\d )\?', url)
if match:
   value = match[1] # 144

if you need the 150

match = re.search(r'i=(\d )', url)
if match:
   value = match[1] # 150
  • Related