I need to split an url which is changing the positions of it's values very oftenly.
for example:- This is the url with three different positions of request token
01:-https://127.0.0.1/?action=login&type=login&status=success&request_token=oCS44HJQT2ZSCGb39H76CjgXb0s2klwA
02:-https://127.0.0.1/?request_token=43CbEWSxdqztXNRpb2zmypCr081eF92d&action=login&type=login&status=success
03:-https://127.0.0.1/?&action=login&request_token=43CbEWSxdqztXNRpb2zmypCr081eF92d&type=login&status=success
From thses url i need only the value of request token which comes after the '=' with an alphanumeric number like this '43CbEWSxdqztXNRpb2zmypCr081eF92d'.
And to split this url i'm using this code
request_token = driver.current_url.split('=')[1].split('&action')[0]
But it gives me error when the url is not in the specified position.
So can anyone please give me a solution to this url splitting in just a single line in python and it'd be a great blessing for me from my fellow stack members.
Note:- Here i'm using
driver.current_url
because i'm working in selenium to do the thing.
CodePudding user response:
You can use the urllib.parse
module to parse URLs properly.
>>> from urllib.parse import urlparse, parse_qs
>>> url = "?request_token=43CbEWSxdqztXNRpb2zmypCr081eF92d&action=login&type=login&status=success"
>>> query = parse_qs(urlparse(url).query)
>>> query['request_token']
['43CbEWSxdqztXNRpb2zmypCr081eF92d']
>>> query['request_token'][0]
'43CbEWSxdqztXNRpb2zmypCr081eF92d'
This handles the actual structure of the URLs and doesn't depend on the position of the parameter or other special cases you'd have to handle in a regex.
CodePudding user response:
Assuming you have the URLs as strings then you could use a regular expression to isolate the request tokens.
import re
urls = ['https://127.0.0.1/?action=login&type=login&status=success&request_token=oCS44HJQT2ZSCGb39H76CjgXb0s2klwA',
'https://127.0.0.1/?request_token=43CbEWSxdqztXNRpb2zmypCr081eF92d&action=login&type=login&status=success',
'https://127.0.0.1/?&action=login&request_token=43CbEWSxdqztXNRpb2zmypCr081eF92d&type=login&status=success']
for url in urls:
m = re.match('.*request_token=(.*?)(?:&|$)', url)
if m:
print(m.group(1))