Home > Software engineering >  urlparse doesn't return params for custom schema
urlparse doesn't return params for custom schema

Time:10-19

I am trying to use urlparse Python library to parse some custom URIs.

I noticed that for some well-known schemes params are parsed correctly:

>>> from urllib.parse import urlparse
>>> urlparse("http://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='http', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')
>>> urlparse("ftp://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='ftp', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')

...but for custom ones - they are not. params field remains empty. Instead, params are treated as a part of path:

>>> urlparse("scheme://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='scheme', netloc='some.domain', path='/some/nested/endpoint;param1=value1;param2=othervalue2', params='', query='query1=val1&query2=val2', fragment='fragment')

Why there is a difference in parsing depending on schema? How can I parse params within urlparse library using custom schema?

CodePudding user response:

Can you remove that custom schemes from the url? That allways will return the params

urlparse("//some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')

CodePudding user response:

This is because urlparse assumes that only a set of schemes will uses parameters in their URL format. You can see that check with in the source code.

if scheme in uses_params and ';' in url:
        url, params = _splitparams(url)
    else:
        params = ''

Which means urlparse will attempt to parse parameters only if the scheme is in uses_params (which is a list of known schemes).

uses_params = ['', 'ftp', 'hdl', 'prospero', 'http', 'imap',
               'https', 'shttp', 'rtsp', 'rtspu', 'sip', 'sips',
               'mms', 'sftp', 'tel']

So to get the expected output you can append your custom scheme into uses_params list and perform the urlparse call again.

>>> from urllib.parse import uses_params, urlparse
>>>
>>> uses_params.append('scheme')
>>> urlparse("scheme://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='scheme', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')
  • Related