I've run into a bug in my code which boils down to unexpected behaviour in python's urllib.parse.urlparse. It happens when a a password in an URL contains ?
. I'm very shy to call this a bug because I'm not 100% certain the URLs are syntactically correct. For example:
https://user:pass[email protected]/path
I'd expect this to parse as:
ParseResult(scheme='https', netloc='user:[email protected]', path='/path', params='', query='', fragment='')
But it actually parses as:
ParseResult(scheme='https', netloc='user:pass', path='', params='', query='[email protected]/path', fragment='')
These URL's are being automatically formatted elsewhere and the use of ?
in the password needs to be supported. Interestingly enough, the URLS are actually sqlalchemy connection strings and sqlalchemy / psycopg2 are both interpreting them as expected.
Question
Are question marks ?
allowed in an URL Password (netloc)? - Ideally answers would refer to the appropriate RFC wording.
Or is python's behaviour here correct?
CodePudding user response:
Section 3.2 in RFC 2396 specifies that ?
is reserved in the authority component (which means you're not allowed to use it within that component without encoding, see 2.2). The authority component, IIUC, is the component encompassing the user-name/password of an HTTP scheme URI (see 3.2.2).